Skip to content

CORS-4235: Machine Management API Field & Feature Gate #10402

Open
patrickdillon wants to merge 2 commits intoopenshift:mainfrom
patrickdillon:mapicapi-ic
Open

CORS-4235: Machine Management API Field & Feature Gate #10402
patrickdillon wants to merge 2 commits intoopenshift:mainfrom
patrickdillon:mapicapi-ic

Conversation

@patrickdillon
Copy link
Contributor

Adds an install config field to allow users to specify day-2 machine management API. Also adds feature gates to limit usage of this field.

An example install config would look like:

apiVersion: v1
baseDomain: devcluster.openshift.com
controlPlane:
  management: ClusterAPI
compute:
- name: worker
  management: ClusterAPI
featureSet: DevPreviewNoUpgrade

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 17, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 17, 2026

@patrickdillon: This pull request references CORS-4235 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Adds an install config field to allow users to specify day-2 machine management API. Also adds feature gates to limit usage of this field.

An example install config would look like:

apiVersion: v1
baseDomain: devcluster.openshift.com
controlPlane:
 management: ClusterAPI
compute:
- name: worker
 management: ClusterAPI
featureSet: DevPreviewNoUpgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

Walkthrough

Adds a MachinePool Management field (enum MachineManagementAPI) to select machine-management API, updates feature-gate validation to enforce ClusterAPI usage for control plane/compute pools, adds tests for these gates, and exposes the new field in the InstallConfig CRD schema.

Changes

Cohort / File(s) Summary
Machine Pool Type Definitions
pkg/types/machinepools.go
Added Management MachineManagementAPI to MachinePool, introduced type MachineManagementAPI string and constants ClusterAPI, MachineAPI with kubebuilder tags and JSON management key.
Feature Gate Validation Logic
pkg/types/validation/featuregates.go, pkg/types/validation/installconfig.go
Renamed/relocated gated-feature function to validateMachinePoolFeatureGates, added two gates for ClusterAPI control plane and compute installs, and updated installconfig validation to call the new validator instead of the previous source.
Feature Gate Tests
pkg/types/validation/featuregate_test.go
Added eight test cases (four scenarios duplicated) covering ClusterAPI management allowed/denied for control plane and compute pools under DevPreviewNoUpgrade and Default feature sets.
CRD Schema / Defaults
data/data/install.openshift.io_installconfigs.yaml
Added management string property to machine pool schema sections with default ClusterAPI, description, enum [ClusterAPI,MachineAPI], and placement in the InstallConfig OpenAPIv3 schema.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +133 to +136
// Using this field for control plane machines requires the ClusterAPIControlPlaneInstall feature gate.
// Using this field for compute machines requires the ClusterAPIComputeInstall feature gate.
// Supported platforms: aws
// Default is ClusterAPI.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once this gate is promoted we can remove these comments, but we will add platform-specific gates for new platforms.

@patrickdillon
Copy link
Contributor Author

CAPI compute PR is growing too big, so I will break it into smaller PRs. I can post the larger PR for reference, but I'm still working through some pieces.

@patrickdillon
Copy link
Contributor Author

cc @damdo @JoelSpeed

@openshift-ci openshift-ci bot requested review from rwsu and tthvo March 17, 2026 18:37
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mtulio for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/types/machinepools.go`:
- Around line 132-141: The new Management field is incorrectly defaulting to
ClusterAPI; change its kubebuilder default so omitted values preserve today's
MachineAPI behavior by either removing the +kubebuilder:default=ClusterAPI line
or replacing it with +kubebuilder:default=MachineAPI for the Management
MachineManagementAPI `management` field in pkg/types/machinepools.go; ensure the
validation enum still allows both ClusterAPI and MachineAPI and that any related
logic/tests expecting MachineAPI remain unchanged.

In `@pkg/types/validation/featuregates.go`:
- Around line 48-51: The gate currently only inspects c.Compute[0] and reports a
non-indexed path; update the FeatureGateClusterAPIComputeInstall check to
iterate over all compute pools in c.Compute, trigger the condition if any pool's
Management equals types.ClusterAPI, and set the Field to an indexed path for the
offending pool (use the loop index to build a field path like
"compute[i].management" via
field.NewPath("compute").Index(i).Child("management") or equivalent). Ensure you
reference the existing symbols
FeatureGateName/features.FeatureGateClusterAPIComputeInstall,
Condition/c.Compute, types.ClusterAPI, and field.NewPath("compute",
"management") when making the change.

In `@pkg/types/validation/installconfig.go`:
- Line 1654: The error message for rejected ClusterAPI management feature-gate
configs is using a generic template that directs users to enable
TechPreviewNoUpgrade or CustomNoUpgrade; update the code path that appends
validateMachinePoolFeatureGates (the call to validateMachinePoolFeatureGates and
the template it returns) so that when the failing gate is the ClusterAPI
management gate it emits a gate-specific remediation message recommending
DevPreviewNoUpgrade (or the correct gate) instead of the shared
TechPreview/Custom suggestion; locate and adjust the message construction inside
validateMachinePoolFeatureGates (or the shared template function it calls) to
return the ClusterAPI-specific guidance.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 63004412-efc9-4cf4-996c-f6da0433245e

📥 Commits

Reviewing files that changed from the base of the PR and between 1896803 and 860ee27.

📒 Files selected for processing (4)
  • pkg/types/machinepools.go
  • pkg/types/validation/featuregate_test.go
  • pkg/types/validation/featuregates.go
  • pkg/types/validation/installconfig.go

Comment on lines +132 to +141
// Management is the API that will be used for machine management in the cluster.
// Using this field for control plane machines requires the ClusterAPIControlPlaneInstall feature gate.
// Using this field for compute machines requires the ClusterAPIComputeInstall feature gate.
// Supported platforms: aws
// Default is ClusterAPI.
//
// +kubebuilder:default=ClusterAPI
// +kubebuilder:validation:Enum=ClusterAPI;MachineAPI
// +optional
Management MachineManagementAPI `json:"management,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't default the new field to ClusterAPI.

ClusterAPI is the gated opt-in path introduced here. Making management default to ClusterAPI flips omitted values onto the new behavior instead of preserving today's MachineAPI behavior.

Suggested fix
-	// Using this field for control plane machines requires the ClusterAPIControlPlaneInstall feature gate.
-	// Using this field for compute machines requires the ClusterAPIComputeInstall feature gate.
+	// Setting this field to ClusterAPI for control plane machines requires the ClusterAPIControlPlaneInstall feature gate.
+	// Setting this field to ClusterAPI for compute machines requires the ClusterAPIComputeInstall feature gate.
 	// Supported platforms: aws
-	// Default is ClusterAPI.
+	// Default is MachineAPI.
 	//
-	// +kubebuilder:default=ClusterAPI
+	// +kubebuilder:default=MachineAPI
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/machinepools.go` around lines 132 - 141, The new Management field
is incorrectly defaulting to ClusterAPI; change its kubebuilder default so
omitted values preserve today's MachineAPI behavior by either removing the
+kubebuilder:default=ClusterAPI line or replacing it with
+kubebuilder:default=MachineAPI for the Management MachineManagementAPI
`management` field in pkg/types/machinepools.go; ensure the validation enum
still allows both ClusterAPI and MachineAPI and that any related logic/tests
expecting MachineAPI remain unchanged.

Comment on lines +48 to +51
{
FeatureGateName: features.FeatureGateClusterAPIComputeInstall,
Condition: len(c.Compute) > 0 && c.Compute[0].Management == types.ClusterAPI,
Field: field.NewPath("compute", "management"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Check every compute pool and use an indexed field path.

management is per machine pool, but this only inspects c.Compute[0]. A later compute pool can set ClusterAPI without hitting the gate, and the error is reported as compute.management instead of compute[n].management.

Suggested fix
 func validateMachinePoolFeatureGates(c *types.InstallConfig) []featuregates.GatedInstallConfigFeature {
-	return []featuregates.GatedInstallConfigFeature{
+	gated := []featuregates.GatedInstallConfigFeature{
 		{
 			FeatureGateName: features.FeatureGateDualReplica,
 			Condition:       c.ControlPlane != nil && c.ControlPlane.Fencing != nil,
 			Field:           field.NewPath("platform", "none", "fencingCredentials"),
 		},
 		{
 			FeatureGateName: features.FeatureGateMultiDiskSetup,
 			Condition:       c.ControlPlane != nil && len(c.ControlPlane.DiskSetup) != 0,
 			Field:           field.NewPath("controlPlane", "diskSetup"),
 		},
 		{
 			FeatureGateName: features.FeatureGateMultiDiskSetup,
 			Condition: func() bool {
 				computeMachinePool := c.Compute
 				for _, compute := range computeMachinePool {
 					if len(compute.DiskSetup) != 0 {
 						return true
 					}
 				}
 				return false
 			}(),
 			Field: field.NewPath("compute", "diskSetup"),
 		},
 		{
 			FeatureGateName: features.FeatureGateOSStreams,
 			Condition:       len(c.OSImageStream) != 0,
 			Field:           field.NewPath("osImageStream"),
 		},
 		{
 			FeatureGateName: features.FeatureGateClusterAPIControlPlaneInstall,
 			Condition:       c.ControlPlane != nil && c.ControlPlane.Management == types.ClusterAPI,
 			Field:           field.NewPath("controlPlane", "management"),
 		},
-		{
-			FeatureGateName: features.FeatureGateClusterAPIComputeInstall,
-			Condition:       len(c.Compute) > 0 && c.Compute[0].Management == types.ClusterAPI,
-			Field:           field.NewPath("compute", "management"),
-		},
 	}
+
+	for i := range c.Compute {
+		gated = append(gated, featuregates.GatedInstallConfigFeature{
+			FeatureGateName: features.FeatureGateClusterAPIComputeInstall,
+			Condition:       c.Compute[i].Management == types.ClusterAPI,
+			Field:           field.NewPath("compute").Index(i).Child("management"),
+		})
+	}
+
+	return gated
-	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/validation/featuregates.go` around lines 48 - 51, The gate
currently only inspects c.Compute[0] and reports a non-indexed path; update the
FeatureGateClusterAPIComputeInstall check to iterate over all compute pools in
c.Compute, trigger the condition if any pool's Management equals
types.ClusterAPI, and set the Field to an indexed path for the offending pool
(use the loop index to build a field path like "compute[i].management" via
field.NewPath("compute").Index(i).Child("management") or equivalent). Ensure you
reference the existing symbols
FeatureGateName/features.FeatureGateClusterAPIComputeInstall,
Condition/c.Compute, types.ClusterAPI, and field.NewPath("compute",
"management") when making the change.

if c.ControlPlane != nil {
gatedFeatures = append(gatedFeatures, defaultsvalidation.GatedFeatures(c)...)
}
gatedFeatures = append(gatedFeatures, validateMachinePoolFeatureGates(c)...)
Copy link

@coderabbitai coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a gate-specific error message for ClusterAPI management.

These new gates are allowed by DevPreviewNoUpgrade in this PR, but the shared template below still tells users to enable TechPreviewNoUpgrade or CustomNoUpgrade. Rejected management: ClusterAPI configs will point people at the wrong remediation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/types/validation/installconfig.go` at line 1654, The error message for
rejected ClusterAPI management feature-gate configs is using a generic template
that directs users to enable TechPreviewNoUpgrade or CustomNoUpgrade; update the
code path that appends validateMachinePoolFeatureGates (the call to
validateMachinePoolFeatureGates and the template it returns) so that when the
failing gate is the ClusterAPI management gate it emits a gate-specific
remediation message recommending DevPreviewNoUpgrade (or the correct gate)
instead of the shared TechPreview/Custom suggestion; locate and adjust the
message construction inside validateMachinePoolFeatureGates (or the shared
template function it calls) to return the ClusterAPI-specific guidance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true, but this is an existing issue with the validation. not sure if we want to fix it in this scope

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon Fair point — since the template is shared across all gated features, fixing it cleanly would require per-gate message customization, which is broader than this PR's intent. Would you like me to open a follow-up GitHub issue to track improving the error message for ClusterAPI management feature-gate rejections?

Adds a management field to machinepools to allow toggling day-2
machine management between MachineAPI and ClusterAPI.
This commit adds validation to ensure that the appropriate feature
gate is enabled to set the management field on the machine pool.
For some reason the machine pool feature gate validation was in
the defaults package, so this moves it to the validation package.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data/data/install.openshift.io_installconfigs.yaml`:
- Around line 146-157: The schema currently sets an unconditional default of
"ClusterAPI" on the management field which is AWS-only and feature-gated; remove
the default entry from the management schema (delete the default: ClusterAPI
line) so the CRD does not auto-materialize an unsupported value, and make the
same removal in the arbiter, compute, and controlPlane management schemas
(ensure enum and type remain intact); retain any platform/feature-aware
defaulting logic in server-side or controller code rather than in the CRD
schema.
- Around line 146-157: The arbiter schema exposes the management field
(arbiter.management) but the description only mentions control-plane and compute
feature gates; either add explicit arbiter semantics to the description (state
whether arbiter supports ClusterAPI and/or MachineAPI, list supported platforms
for arbiter, and identify which feature gate applies for arbiter installation)
or remove the management field from the arbiter schema until arbiter behavior
and gate are defined; ensure the enum (ClusterAPI, MachineAPI) and type: string
remain consistent with the documented arbiter behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d1607282-fa69-4744-83de-a10c482ea042

📥 Commits

Reviewing files that changed from the base of the PR and between 860ee27 and 8109a03.

📒 Files selected for processing (5)
  • data/data/install.openshift.io_installconfigs.yaml
  • pkg/types/machinepools.go
  • pkg/types/validation/featuregate_test.go
  • pkg/types/validation/featuregates.go
  • pkg/types/validation/installconfig.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/types/validation/installconfig.go
  • pkg/types/machinepools.go

Comment on lines +146 to +157
management:
default: ClusterAPI
description: |-
Management is the API that will be used for machine management in the cluster.
Using this field for control plane machines requires the ClusterAPIControlPlaneInstall feature gate.
Using this field for compute machines requires the ClusterAPIComputeInstall feature gate.
Supported platforms: aws
Default is ClusterAPI.
enum:
- ClusterAPI
- MachineAPI
type: string
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't make an AWS-only, gated path the unconditional schema default.

These blocks say management is AWS-only and feature-gated, but the schema still defaults it to ClusterAPI. A CRD default cannot express those conditions, so any consumer that honors schema defaults can materialize an unsupported/gated value outside the valid context.

Minimal safe fix
-                default: ClusterAPI
                 description: |-
                   Management is the API that will be used for machine management in the cluster.
                   Using this field for control plane machines requires the ClusterAPIControlPlaneInstall feature gate.
                   Using this field for compute machines requires the ClusterAPIComputeInstall feature gate.
                   Supported platforms: aws
-                  Default is ClusterAPI.

Apply the same change to the arbiter, compute, and controlPlane schemas, and keep any defaulting in code that can evaluate platform and feature-gate state.

Also applies to: 1718-1729, 3230-3241

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/data/install.openshift.io_installconfigs.yaml` around lines 146 - 157,
The schema currently sets an unconditional default of "ClusterAPI" on the
management field which is AWS-only and feature-gated; remove the default entry
from the management schema (delete the default: ClusterAPI line) so the CRD does
not auto-materialize an unsupported value, and make the same removal in the
arbiter, compute, and controlPlane management schemas (ensure enum and type
remain intact); retain any platform/feature-aware defaulting logic in
server-side or controller code rather than in the CRD schema.

⚠️ Potential issue | 🟠 Major

Define arbiter semantics before exposing arbiter.management.

This field is published under arbiter, but the description only explains control-plane and compute gating. As written, the public schema does not tell users whether arbiter is supported here or which gate applies. Either document arbiter behavior explicitly or leave this field out of the arbiter schema for now.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/data/install.openshift.io_installconfigs.yaml` around lines 146 - 157,
The arbiter schema exposes the management field (arbiter.management) but the
description only mentions control-plane and compute feature gates; either add
explicit arbiter semantics to the description (state whether arbiter supports
ClusterAPI and/or MachineAPI, list supported platforms for arbiter, and identify
which feature gate applies for arbiter installation) or remove the management
field from the arbiter schema until arbiter behavior and gate are defined;
ensure the enum (ClusterAPI, MachineAPI) and type: string remain consistent with
the documented arbiter behavior.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

@patrickdillon: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JoelSpeed
Copy link
Contributor

Changes here LGTM
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants