Skip to content

Add blog post on AKS Configurable Scheduler Profiles#5505

Open
colinmixonn wants to merge 93 commits intomasterfrom
colinmixonn-patch-4
Open

Add blog post on AKS Configurable Scheduler Profiles#5505
colinmixonn wants to merge 93 commits intomasterfrom
colinmixonn-patch-4

Conversation

@colinmixonn
Copy link
Contributor

This blog post introduces AKS Configurable Scheduler Profiles, highlighting their benefits for optimizing resource utilization and improving scheduling strategies for web-distributed and AI workloads. It covers configuration examples for GPU utilization, pod distribution across topology domains, and memory-optimized scheduling.

This blog post introduces AKS Configurable Scheduler Profiles, highlighting their benefits for optimizing resource utilization and improving scheduling strategies for web-distributed and AI workloads. It covers configuration examples for GPU utilization, pod distribution across topology domains, and memory-optimized scheduling.
Added a new tag for Scheduler with relevant details.
Updated blog post on AKS Configurable Scheduler Profiles to improve clarity and correctness, including sections on GPU utilization, pod distribution, and memory-optimized scheduling.
Corrected typos and improved clarity in the blog post about AKS Configurable Scheduler Profiles.
Updated the blog to clarify the objectives of configuring AKS Configurable Scheduler Profiles, improved section titles, and ensured consistency in terminology.
Clarified the objectives and improved the wording in the blog post about AKS Configurable Scheduler Profiles.
@colinmixonn colinmixonn marked this pull request as ready for review December 11, 2025 17:52
@colinmixonn colinmixonn requested review from a team, circy9, Copilot, qpetraroia and seanmck and removed request for Copilot December 11, 2025 17:52
@colinmixonn colinmixonn requested a review from palma21 as a code owner December 11, 2025 17:52
Copilot AI review requested due to automatic review settings December 11, 2025 18:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a new blog post announcing the preview of AKS Configurable Scheduler Profiles, a feature that enables fine-grained control over pod scheduling strategies to optimize resource utilization and improve workload performance.

Key Changes

  • Introduces a new "scheduler" tag to categorize blog posts related to pod placement and scheduling optimization
  • Adds comprehensive blog post covering three main scheduling use cases: GPU bin-packing for AI workloads, pod distribution across topology domains for resilience, and memory-optimized scheduling with PVC-aware placement
  • Provides YAML configuration examples and best practices for implementing custom scheduler profiles

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 20 comments.

File Description
website/blog/tags.yml Adds new "scheduler" tag for categorizing posts about pod placement and scheduling techniques
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md New blog post introducing AKS Configurable Scheduler Profiles with configuration examples for GPU utilization, topology distribution, and memory-optimized scheduling

colinmixonn and others added 6 commits December 11, 2025 11:14
…index.md

Co-authored-by: Diego Casati <diego.casati@gmail.com>
…index.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 5 changed files in this pull request and generated 1 comment.

apiVersion: aks.azure.com/v1alpha1
kind: SchedulerConfiguration
metadata:
name: upstream
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example uses metadata.name: upstream. If readers apply multiple examples, they'll overwrite the same SchedulerConfiguration object. Use a profile-specific resource name (or call out that the name must be unique per cluster/namespace).

Suggested change
name: upstream
name: gpu-node-binpacking-scheduler-config

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 5 changed files in this pull request and generated 1 comment.

Comment on lines +32 to +36
This blog provides examples of three different scheduler profiles and details the benefits of each to increase node utilization for AKS clusters:

1. [How to increase AKS cluster GPU utilization](#increase-aks-cluster-gpu-utilization)
2. [How to increase AKS cluster CPU utilization](#increase-aks-cluster-cpu-utilization)

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says there are "three different scheduler profiles," but only two examples are listed. Either add the third example/section or update the wording to match the actual content.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 5 changed files in this pull request and generated no new comments.

Updated the blog to reflect the change from three to two scheduler profiles and added an FAQ section addressing common questions about the configurable scheduler profiles.
Expanded the introduction to Configurable Scheduler Profiles on AKS, detailing its benefits and providing examples of two different scheduler profiles to increase node utilization.
Removed redundant sentence and improved clarity in the introduction.
Removed introductory phrase from note about resource weights and parameters.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 5 changed files in this pull request and generated 3 comments.

Comment on lines +94 to +99
apiVersion: aks.azure.com/v1alpha1
kind: SchedulerConfiguration
metadata:
name: upstream
spec:
rawConfig: |
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example also uses metadata.name: upstream, which would conflict with the earlier example’s resource name. Please use a unique name here as well (or clarify that the earlier resource should be replaced).

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +152
[concepts-scheduler-configuration]: https://learn.microsoft.com/azure/aks/concepts-scheduler-configuration
[kueue-overview]: https://learn.microsoft.com/azure/aks/kueue-overview
[best-practices-advanced-scheduler]: https://learn.microsoft.com/azure/aks/operator-best-practices-advanced-scheduler
[scheduling-framework/#interfaces]: https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#interfaces
[supported-in-tree-scheduling-plugins]: https://learn.microsoft.com/azure/aks/concepts-scheduler-configuration#supported-in-tree-scheduling-plugins
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference label [scheduling-framework/#interfaces] includes / and #, which can be brittle across Markdown processors. Consider renaming the reference label to a simpler identifier (for example, scheduling-framework-interfaces) while keeping the same URL target.

Copilot uses AI. Check for mistakes.
…index.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 5 changed files in this pull request and generated no new comments.

Updated the description of Configurable Scheduler Profiles to emphasize increased node utilization and clarified the functionality of the scheduling framework. Added information about the accessibility of scheduler configuration starting from Kubernetes version 1.33.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 5 changed files in this pull request and generated no new comments.

Enhanced the explanation of Kubernetes scheduler operations and the benefits of Configurable Scheduler Profiles on AKS. Clarified the impact of scheduling strategies on resource utilization and operational complexity.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 1 comment.

apiVersion: aks.azure.com/v1alpha1
kind: SchedulerConfiguration
metadata:
name: upstream
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example uses metadata.name: upstream, and the CPU example later uses the same name. If readers apply both, they will conflict/overwrite. Consider giving each example a unique metadata.name to avoid copy/paste issues.

Suggested change
name: upstream
name: gpu-node-binpacking-scheduler

Copilot uses AI. Check for mistakes.
Added section on increasing node utilization and operator control with configurable scheduler profiles.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated 2 comments.

- name: NodeResourcesBalancedAllocation
pluginConfig:
- name: NodeResourcesFit
args:
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GPU example’s NodeResourcesFit args omits the typed wrapper (apiVersion/kind: NodeResourcesFitArgs) that you include in the CPU example. For documentation copy/paste reliability, keep both examples consistent with the kube-scheduler config schema (add apiVersion/kind in the first example as well) to avoid readers hitting validation/parsing errors depending on tooling.

Suggested change
args:
args:
apiVersion: kubescheduler.config.k8s.io/v1
kind: NodeResourcesFitArgs

Copilot uses AI. Check for mistakes.
score: 0
```
### FAQ
1. How does this interact with autoscalers such as Node Auto Provisioning (NAP), Cluster Autoscaler (CAS), and Vertical Pod Autoscaler (VPA)?
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FAQ item #1 is a question without an answer. Either add an explicit answer (even if it’s a short guidance + link) or remove the question to avoid confusing readers.

Suggested change
1. How does this interact with autoscalers such as Node Auto Provisioning (NAP), Cluster Autoscaler (CAS), and Vertical Pod Autoscaler (VPA)?
1. How does this interact with autoscalers such as Node Auto Provisioning (NAP), Cluster Autoscaler (CAS), and Vertical Pod Autoscaler (VPA)? Configurable Scheduler Profiles control how pods are placed on existing nodes, while autoscalers still decide when to add, remove, or resize nodes based on resource demand. You should validate new profiles with your current NAP, CAS, and VPA settings in a test cluster to ensure they work together as expected before rolling them out to production.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 6 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants