Skip to content

fix Azure profile-aware AKS Flex bootstrap#54

Merged
bcho merged 6 commits intomainfrom
fix/node-bootstrap-script-held-packages
Mar 17, 2026
Merged

fix Azure profile-aware AKS Flex bootstrap#54
bcho merged 6 commits intomainfrom
fix/node-bootstrap-script-held-packages

Conversation

@qike-ms
Copy link
Collaborator

@qike-ms qike-ms commented Mar 14, 2026

Summary

  • make AKS Flex config resolution honor

    AZURE_CONFIG_DIR when deriving the default Azure subscription ID
  • make node bootstrap use the tenant from the selected Azure CLI profile instead of assuming the default profile tenant
  • install Cilium in AKS BYOCNI mode using kubeconfig-derived API server settings so Flex cluster creation works with non-default Azure CLI profiles

Why

AKS Flex cluster creation and node bootstrap could silently use the wrong Azure CLI profile when users isolate multiple Azure accounts with different AZURE_CONFIG_DIR folders. That broke Flex cluster creation and external node bootstrap for clusters created from a non-default Azure profile.

Validation

  • go test ./internal/config/... ./internal/aks/deploy/... in cli
  • go test ./pkg/util/config/... in plugin
  • end-to-end recreated a Flex cluster in a non-default Azure profile and joined external Linux nodes successfully

@qike-ms qike-ms requested review from anson627, bcho and Copilot and removed request for anson627 and Copilot March 16, 2026 17:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves AKS Flex bootstrap and related tooling to correctly respect non-default Azure CLI profiles (notably when users set AZURE_CONFIG_DIR), avoiding accidental use of the default profile during cluster creation, node bootstrap, and Cilium install.

Changes:

  • Make default Azure subscription ID resolution read clouds.config from AZURE_CONFIG_DIR (with fallback to $HOME/.azure).
  • Make kubeadm defaults pick the tenant ID from the Azure CLI profile under AZURE_CONFIG_DIR when creating AzureCLICredential.
  • Install Cilium in AKS BYOCNI mode using kubeconfig-derived API server host/port, and update node bootstrap apt install flags.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
plugin/pkg/util/config/config.go Honor AZURE_CONFIG_DIR when reading clouds.config for default subscription ID.
plugin/pkg/util/config/config_test.go Adds test coverage for AZURE_CONFIG_DIR subscription resolution.
cli/internal/config/nodebootstrap/assets/script.sh.tmpl Adjusts apt install command to allow changing held packages.
cli/internal/config/nodebootstrap/script_test.go Updates expected rendered script line for apt install flag change.
cli/internal/config/configcmd/defaults.go Uses Azure CLI profile tenant (from azureProfile.json) when building Azure CLI credential for kubeadm defaults.
cli/internal/config/configcmd/defaults_test.go Adds test coverage for tenant ID resolution from AZURE_CONFIG_DIR profile.
cli/internal/aks/deploy/cilium.go Switches to BYOCNI-oriented Cilium install flow and derives API server host/port from kubeconfig.
cli/internal/aks/deploy/cilium_test.go Adds test coverage for kubeconfig API server parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +35 to +52
k8sServiceHost, k8sServicePort, err := kubeconfigAPIServer(kubeconfigFile)
if err != nil {
return err
}

cmd := exec.CommandContext(
ctx,
"cilium", "install",
"--set", "azure.resourceGroup="+cfg.ResourceGroupName,
"--kubeconfig", kubeconfigFile,
"--context", cfg.ClusterName+"-admin",
"--namespace", "kube-system",
"--datapath-mode", "aks-byocni",
"--helm-set", "aksbyocni.enabled=true",
"--helm-set", "cluster.name="+cfg.ClusterName,
"--helm-set", "operator.replicas=1",
"--helm-set", "kubeProxyReplacement=true",
"--helm-set", "k8sServiceHost="+k8sServiceHost,
"--helm-set", "k8sServicePort="+k8sServicePort,
Copy link
Member

@bcho bcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return cfg
}

func azureConfigTenantID() string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually easier to just extract the tenant id via getting the command output of az account show --query 'tenantId' -o tsv. This can avoid adding hard dependency on az cli's internal state.

@copilot

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we should move this function to pkg/util/config @copilot

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suppose these changes are included in 2a5060a already? Not sure why it still shows up in this pr. Maybe we should do a rebase here. @copilot

return cmd.Run()
}

func kubeconfigAPIServer(kubeconfigFile string) (string, string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can potentitally use implementations under

func Kubeconfig(ctx context.Context, credentials azcore.TokenCredential, cfg *config.Config) (*api.Config, error) {
to simplify this impl, also prefer moving this function to there. @copilot

Copy link
Contributor

Copilot AI commented Mar 17, 2026

@bcho I've opened a new pull request, #55, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 3 commits March 16, 2026 18:43
…ve kubeconfig/k8s helpers to shared packages (#55)

* Initial plan

* Address review feedback: use az CLI for tenant ID, move utility functions to shared packages

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>

* Use az CLI for subscription ID; fix indentation in APIServerFromKubeconfigFile

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>

* Add contextName parameter to APIServerFromKubeconfigFile; pass context in cilium deploy

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>

* Extract clusterContext variable in deployCilium; reuse across API server lookup and cilium install flags

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Member

@bcho bcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@bcho bcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot run go mod tidy across all sub modules

Copy link
Contributor

Copilot AI commented Mar 17, 2026

@bcho I've opened a new pull request, #56, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan

* Run go mod tidy across all submodules (cli, karpenter, plugin)

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>
@bcho bcho merged commit 09c56bf into main Mar 17, 2026
9 checks passed
@bcho bcho deleted the fix/node-bootstrap-script-held-packages branch March 17, 2026 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants