Skip to content

CCO-738, CCO-739: Set operator condition to Progressing when pod identity webhook pod are updating#978

Open
dlom wants to merge 1 commit intoopenshift:masterfrom
dlom:CCO-738
Open

CCO-738, CCO-739: Set operator condition to Progressing when pod identity webhook pod are updating#978
dlom wants to merge 1 commit intoopenshift:masterfrom
dlom:CCO-738

Conversation

@dlom
Copy link
Contributor

@dlom dlom commented Feb 26, 2026

xref: CCO-738
xref: CCO-739
/assign @jstuever

the Pod Identity Webhook controller will now check all pods owned by its deployment for their image version, and indicate progressing if the number of pods doesn't match the desired replicas, or if any of the pods have an older image

Summary by CodeRabbit

  • New Features

    • Pod identity webhook deployment status tracking with automatic detection of availability and rollout progress.
  • Improvements

    • Wider context propagation across status and reconciliation flows for better cancellation/timeouts and more robust client operations.
    • Status progression explicitly set when operator versions change.
    • Added synchronization around condition handling to improve reliability during concurrent reconciliations.
  • Tests

    • Expanded and reorganized tests, adding coverage for webhook status checks and contextualized subtests.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 26, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 26, 2026

@dlom: This pull request references CCO-738 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

xref: CCO-738
xref: CCO-739
/assign @jstuever

the Pod Identity Webhook controller will now check all pods owned by its deployment for their image version, and indicate progressing if the number of pods doesn't match the desired replicas, or if any of the pods have an older image

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from jstuever and suhanime February 26, 2026 21:18
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dlom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dlom
Copy link
Contributor Author

dlom commented Feb 26, 2026

/retitle CCO-738, CCO-739: Set operator condition to Progressing when pod identity webhook pod are updating

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 26, 2026
@openshift-ci openshift-ci bot changed the title CCO-738: Set operator condition to Progressing when pod identity webhook pod are updating CCO-738, CCO-739: Set operator condition to Progressing when pod identity webhook pod are updating Feb 26, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 26, 2026

@dlom: This pull request references CCO-738 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

This pull request references CCO-739 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

xref: CCO-738
xref: CCO-739
/assign @jstuever

the Pod Identity Webhook controller will now check all pods owned by its deployment for their image version, and indicate progressing if the number of pods doesn't match the desired replicas, or if any of the pods have an older image

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

Walkthrough

Added context.Context parameters to many Handler/GetConditions methods and propagated ctx through status reconciliation and client calls; introduced pod-identity-webhook pod-status checks, synchronization for condition access, and new condition generation; updated call sites and tests accordingly.

Changes

Cohort / File(s) Summary
Global GetConditions context propagation
pkg/operator/cleanup/status.go, pkg/operator/loglevel/status.go, pkg/operator/secretannotator/status/status.go, pkg/operator/credentialsrequest/status.go
Changed GetConditions signatures to GetConditions(ctx context.Context, logger ...) and imported context. credentialsrequest/status.go also updated internal getOperatorState(ctx ...) and client List/Get calls to use ctx.
Status controller updates
pkg/operator/status/status_controller.go, pkg/operator/status/status_controller_test.go
Updated Handler interface GetConditions(ctx, logger), syncStatus(ctx, ...) and Reconcile to propagate ctx; Kubernetes client Get/Create/Status().Update now use ctx; tests refactored to subtests and handler signature adjusted; Progressing condition set explicitly on version change.
CredentialsRequest controllers & tests
pkg/operator/credentialsrequest/credentialsrequest_controller.go, pkg/operator/credentialsrequest/..._test.go
Reconciler GetConditions signatures now accept ctx; reconciler List/Get calls use ctx; tests updated to call GetConditions(context.TODO(), logger).
Pod identity webhook implementation
pkg/operator/podidentity/podidentitywebhook_controller.go
Added conditionsMutex *sync.RWMutex, webhookPodStatus type and methods (Available(), Progressing()), CheckPodStatus(ctx) to inspect Deployment status, new reason constants, and updated GetConditions(ctx, logger) to use pod status and guarded access.
Pod identity webhook tests
pkg/operator/podidentity/podidentitywebhook_controller_test.go
New tests (TestCheckPodStatus, TestWebhookPodStatus, TestGetConditions) and helpers; test reconciler initialized with conditionsMutex; coverage for replica-state translations and condition generation.
Misc credential tests & label handler
pkg/operator/credentialsrequest/credentialsrequest_controller_label_test.go, pkg/operator/credentialsrequest/credentialsrequest_controller_*.go (tests)
All credential-related tests updated to pass a context to GetConditions; minor test wiring adjustments.
Status-related misc changes
pkg/operator/cleanup/status.go (also listed above)
Signatures updated to accept ctx; callers/tests adjusted. No other logic changes beyond context propagation.

Sequence Diagram(s)

sequenceDiagram
    participant StatusController as StatusController
    participant PodHandler as PodIdentityHandler
    participant KubeAPI as KubernetesAPI
    participant StatusStore as ClusterOperatorStatus

    rect rgba(63,81,181,0.5)
    StatusController->>PodHandler: GetConditions(ctx, logger)
    end

    rect rgba(0,150,136,0.5)
    PodHandler->>KubeAPI: CheckPodStatus(ctx) — Get Deployment & PDB
    KubeAPI-->>PodHandler: Deployment & PDB status
    PodHandler->>PodHandler: compute Available / Progressing
    PodHandler-->>StatusController: conditions
    end

    rect rgba(255,152,0,0.5)
    StatusController->>StatusStore: Create/Update(ctx, ClusterOperator with conditions)
    StatusStore-->>StatusController: ack / error
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly describes the main change: adding logic to set the operator condition to Progressing when pod identity webhook pods are updating, which aligns with the primary focus of changes in podidentitywebhook_controller.go and status_controller.go.
Stable And Deterministic Test Names ✅ Passed All test names in modified test files use static, deterministic strings with no dynamic content.
Test Structure And Quality ✅ Passed The PR uses standard Go testing with testify assertions and t.Run() subtests, not Ginkgo. Ginkgo is available in the repository but is not used in the changed test files, making this check not applicable to the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@dlom: This pull request references CCO-738 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

This pull request references CCO-739 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

xref: CCO-738
xref: CCO-739
/assign @jstuever

the Pod Identity Webhook controller will now check all pods owned by its deployment for their image version, and indicate progressing if the number of pods doesn't match the desired replicas, or if any of the pods have an older image

Summary by CodeRabbit

  • New Features

  • Added pod identity webhook deployment status tracking with automatic detection of availability and rollout progress.

  • Improvements

  • Enhanced status checking operations with improved context handling for better cancellation and timeout propagation across system components.

  • Tests

  • Improved test organization and coverage with structured subtests for pod identity webhook status validation and condition verification.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
pkg/operator/status/status_controller.go (1)

191-212: Context not fully propagated to Kubernetes API calls.

While syncStatus now accepts a ctx parameter and correctly passes it to handler.GetConditions(ctx, logger) at line 212, the Kubernetes client calls within syncStatus still use context.TODO():

  • Line 195: kubeClient.Get(context.TODO(), ...)
  • Line 269: kubeClient.Create(context.TODO(), ...)
  • Line 285: kubeClient.Status().Update(context.TODO(), ...)

For consistent context propagation (supporting cancellation, deadlines, and tracing), these should use the passed ctx.

♻️ Suggested changes
-	err := kubeClient.Get(context.TODO(), types.NamespacedName{Name: constants.CloudCredClusterOperatorName}, co)
+	err := kubeClient.Get(ctx, types.NamespacedName{Name: constants.CloudCredClusterOperatorName}, co)
-		if err := kubeClient.Create(context.TODO(), co); err != nil {
+		if err := kubeClient.Create(ctx, co); err != nil {
-		if err := kubeClient.Status().Update(context.TODO(), co); err != nil {
+		if err := kubeClient.Status().Update(ctx, co); err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/status/status_controller.go` around lines 191 - 212, syncStatus
currently mixes the passed context with context.TODO() in Kubernetes client
calls; replace the context.TODO() usages in the kubeClient.Get,
kubeClient.Create, and kubeClient.Status().Update invocations inside syncStatus
with the function parameter ctx so all API calls use the same propagated context
(note: handler.GetConditions already uses ctx).
pkg/operator/podidentity/podidentitywebhook_controller.go (1)

403-424: Consider using the operatorNamespace constant and clarify nil replicas behavior.

  1. Line 407 hardcodes "openshift-cloud-credential-operator" but the constant operatorNamespace is already defined at line 42.

  2. When deployment.Spec.Replicas is nil, returning an empty webhookPodStatus{} (all zeros) may cause Available() to return false even though Kubernetes defaults nil replicas to 1 and the deployment might actually be available.

♻️ Suggested improvement
 func (r *staticResourceReconciler) CheckPodStatus(ctx context.Context) (*webhookPodStatus, error) {
 	deployment := &appsv1.Deployment{}
 	err := r.client.Get(ctx, client.ObjectKey{
 		Name:      "pod-identity-webhook",
-		Namespace: "openshift-cloud-credential-operator",
+		Namespace: operatorNamespace,
 	}, deployment)
 	if err != nil {
 		return nil, err
 	}
 
 	if deployment.Spec.Replicas == nil {
-		return &webhookPodStatus{}, nil
+		// Kubernetes defaults nil replicas to 1
+		return &webhookPodStatus{
+			desiredReplicas:   1,
+			availableReplicas: deployment.Status.AvailableReplicas,
+			updatedReplicas:   deployment.Status.UpdatedReplicas,
+		}, nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/podidentity/podidentitywebhook_controller.go` around lines 403 -
424, In CheckPodStatus, replace the hardcoded namespace string with the existing
operatorNamespace constant and handle nil deployment.Spec.Replicas by treating
it as Kubernetes default of 1 instead of returning an empty webhookPodStatus;
when deployment.Spec.Replicas == nil, set desiredReplicas to 1 and populate
availableReplicas/updatedReplicas from deployment.Status so that Available()
reflects the real deployment state (refer to function CheckPodStatus and the
webhookPodStatus struct).
pkg/operator/podidentity/podidentitywebhook_controller_test.go (1)

344-466: Good test coverage, consider adding a "deployment not found" test case.

The tests thoroughly cover various deployment status scenarios. Consider adding a test case where the deployment doesn't exist to verify error handling in CheckPodStatus.

📝 Additional test case
{
    name:       "Deployment not found",
    deployment: nil,
    expectErr:  true,
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/podidentity/podidentitywebhook_controller_test.go` around lines
344 - 466, Add a test case to TestCheckPodStatus to verify behavior when the
Deployment is missing: include an entry with name "Deployment not found",
deployment: nil, and expectErr: true; ensure the test loop keeps the existing
logic that builds existing runtime.Objects from test.deployment, so when
deployment is nil the fake client has no Deployment and CheckPodStatus (on
staticResourceReconciler) should return a non-nil error which the test asserts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/podidentity/podidentitywebhook_controller.go`:
- Around line 435-455: The current code assigns conditions := r.conditions which
can alias the underlying slice and cause unexpected mutation when append is
used; replace that with a non-aliased copy (e.g., conditions :=
append([]configv1.ClusterOperatorStatusCondition(nil), r.conditions...)) so
appends do not modify r.conditions, and remove the trailing "\n" characters from
the Message fields in the ClusterOperatorStatusCondition entries (the ones built
when !podStatus.Available() and when podStatus.Progressing()) so messages are
clean and consistent.

---

Nitpick comments:
In `@pkg/operator/podidentity/podidentitywebhook_controller_test.go`:
- Around line 344-466: Add a test case to TestCheckPodStatus to verify behavior
when the Deployment is missing: include an entry with name "Deployment not
found", deployment: nil, and expectErr: true; ensure the test loop keeps the
existing logic that builds existing runtime.Objects from test.deployment, so
when deployment is nil the fake client has no Deployment and CheckPodStatus (on
staticResourceReconciler) should return a non-nil error which the test asserts.

In `@pkg/operator/podidentity/podidentitywebhook_controller.go`:
- Around line 403-424: In CheckPodStatus, replace the hardcoded namespace string
with the existing operatorNamespace constant and handle nil
deployment.Spec.Replicas by treating it as Kubernetes default of 1 instead of
returning an empty webhookPodStatus; when deployment.Spec.Replicas == nil, set
desiredReplicas to 1 and populate availableReplicas/updatedReplicas from
deployment.Status so that Available() reflects the real deployment state (refer
to function CheckPodStatus and the webhookPodStatus struct).

In `@pkg/operator/status/status_controller.go`:
- Around line 191-212: syncStatus currently mixes the passed context with
context.TODO() in Kubernetes client calls; replace the context.TODO() usages in
the kubeClient.Get, kubeClient.Create, and kubeClient.Status().Update
invocations inside syncStatus with the function parameter ctx so all API calls
use the same propagated context (note: handler.GetConditions already uses ctx).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6b15769e-7d22-4cfe-82d7-5dccbd847557

📥 Commits

Reviewing files that changed from the base of the PR and between e338ba6 and b348e7d.

📒 Files selected for processing (14)
  • pkg/operator/cleanup/status.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_azure_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_gcp_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_label_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_vsphere_test.go
  • pkg/operator/credentialsrequest/status.go
  • pkg/operator/loglevel/status.go
  • pkg/operator/podidentity/podidentitywebhook_controller.go
  • pkg/operator/podidentity/podidentitywebhook_controller_test.go
  • pkg/operator/secretannotator/status/status.go
  • pkg/operator/status/status_controller.go
  • pkg/operator/status/status_controller_test.go

@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 88.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.46%. Comparing base (af37b11) to head (2cc41a6).
⚠️ Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
pkg/operator/credentialsrequest/status.go 66.66% 1 Missing and 1 partial ⚠️
pkg/operator/status/status_controller.go 77.77% 1 Missing and 1 partial ⚠️
pkg/operator/cleanup/status.go 0.00% 1 Missing ⚠️
...redentialsrequest/credentialsrequest_controller.go 50.00% 0 Missing and 1 partial ⚠️
pkg/operator/loglevel/status.go 0.00% 1 Missing ⚠️
...rator/podidentity/podidentitywebhook_controller.go 98.18% 1 Missing ⚠️
pkg/operator/secretannotator/status/status.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #978      +/-   ##
==========================================
+ Coverage   46.31%   46.46%   +0.14%     
==========================================
  Files          98       98              
  Lines       12225    12309      +84     
==========================================
+ Hits         5662     5719      +57     
- Misses       5913     5940      +27     
  Partials      650      650              
Files with missing lines Coverage Δ
pkg/operator/cleanup/status.go 0.00% <0.00%> (ø)
...redentialsrequest/credentialsrequest_controller.go 44.23% <50.00%> (ø)
pkg/operator/loglevel/status.go 0.00% <0.00%> (ø)
...rator/podidentity/podidentitywebhook_controller.go 44.20% <98.18%> (+16.98%) ⬆️
pkg/operator/secretannotator/status/status.go 0.00% <0.00%> (ø)
pkg/operator/credentialsrequest/status.go 66.66% <66.66%> (ø)
pkg/operator/status/status_controller.go 52.21% <77.77%> (+0.64%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dlom dlom force-pushed the CCO-738 branch 2 times, most recently from 6274a2d to 7c3a20b Compare March 9, 2026 18:46
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@dlom: This pull request references CCO-738 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

This pull request references CCO-739 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

xref: CCO-738
xref: CCO-739
/assign @jstuever

the Pod Identity Webhook controller will now check all pods owned by its deployment for their image version, and indicate progressing if the number of pods doesn't match the desired replicas, or if any of the pods have an older image

Summary by CodeRabbit

  • New Features

  • Pod identity webhook deployment status tracking with automatic detection of availability and rollout progress.

  • Improvements

  • Broader context propagation across status and reconciliation flows for proper cancellation/timeouts.

  • Status progression now explicitly set when operator versions change.

  • Tests

  • Expanded and reorganized tests, adding coverage for webhook status checks and converting many tests to use contextualized subtests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/operator/podidentity/podidentitywebhook_controller_test.go (1)

471-641: Good test coverage, consider adding error case.

The test cases thoroughly verify condition generation logic, including preservation of existing conditions.

Consider adding a test case for when the deployment is not found to verify GetConditions error handling behavior:

{
    name:       "Deployment not found - returns error",
    deployment: nil,
    expectErr:  true,
},

This would document the expected behavior during bootstrap when the deployment hasn't been created yet.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/podidentity/podidentitywebhook_controller_test.go` around lines
471 - 641, Add a new test case to TestGetConditions to cover the "deployment not
found" error path: update the tests slice (in TestGetConditions) with an entry
where name is "Deployment not found - returns error", deployment is nil, and
expectErr is true; this will exercise staticResourceReconciler.GetConditions and
assert that when no Deployment exists the function returns an error (keep other
fields zero-values and ensure the fake client is built with no runtime objects
so GetConditions returns the expected error).
pkg/operator/credentialsrequest/status.go (1)

45-51: Consider propagating context to GetRelatedObjects.

GetRelatedObjects uses context.TODO() when calling getOperatorState, while GetConditions properly receives and propagates context. This inconsistency means GetRelatedObjects won't respect cancellation or deadlines.

If the Handler interface's GetRelatedObjects signature is updated in a follow-up change, this can be addressed then. Otherwise, consider adding a TODO comment to track this.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/credentialsrequest/status.go` around lines 45 - 51,
GetRelatedObjects currently calls getOperatorState with context.TODO(), which
prevents honoring cancellations/deadlines; update GetRelatedObjects to either
accept and pass a context (preferred: change signature to accept ctx and call
getOperatorState(ctx, logger)) or, if the Handler interface can't change now,
add a clear TODO comment inside GetRelatedObjects noting that context.TODO() is
temporary and must be replaced with a propagated context once the
Handler/GetRelatedObjects signature is updated to accept context; reference
GetRelatedObjects and getOperatorState in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/status/status_controller_test.go`:
- Around line 458-460: The assertion in the loop currently compares
test.expected.RelatedObjects[i] to itself which always passes; change the
assertion to compare the expected element to the actual result element (e.g.,
replace assert.Equal(t, expected, test.expected.RelatedObjects[i]) with
assert.Equal(t, expected, test.actual.RelatedObjects[i]) or the equivalent
actual/result field used in the test) so the loop verifies actual output against
test.expected.RelatedObjects; update the assert call inside the for loop
accordingly.

---

Nitpick comments:
In `@pkg/operator/credentialsrequest/status.go`:
- Around line 45-51: GetRelatedObjects currently calls getOperatorState with
context.TODO(), which prevents honoring cancellations/deadlines; update
GetRelatedObjects to either accept and pass a context (preferred: change
signature to accept ctx and call getOperatorState(ctx, logger)) or, if the
Handler interface can't change now, add a clear TODO comment inside
GetRelatedObjects noting that context.TODO() is temporary and must be replaced
with a propagated context once the Handler/GetRelatedObjects signature is
updated to accept context; reference GetRelatedObjects and getOperatorState in
the comment.

In `@pkg/operator/podidentity/podidentitywebhook_controller_test.go`:
- Around line 471-641: Add a new test case to TestGetConditions to cover the
"deployment not found" error path: update the tests slice (in TestGetConditions)
with an entry where name is "Deployment not found - returns error", deployment
is nil, and expectErr is true; this will exercise
staticResourceReconciler.GetConditions and assert that when no Deployment exists
the function returns an error (keep other fields zero-values and ensure the fake
client is built with no runtime objects so GetConditions returns the expected
error).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a71ebea7-a4a4-424a-b6a3-b92543e045a7

📥 Commits

Reviewing files that changed from the base of the PR and between b348e7d and 7c3a20b.

📒 Files selected for processing (14)
  • pkg/operator/cleanup/status.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_azure_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_gcp_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_label_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_vsphere_test.go
  • pkg/operator/credentialsrequest/status.go
  • pkg/operator/loglevel/status.go
  • pkg/operator/podidentity/podidentitywebhook_controller.go
  • pkg/operator/podidentity/podidentitywebhook_controller_test.go
  • pkg/operator/secretannotator/status/status.go
  • pkg/operator/status/status_controller.go
  • pkg/operator/status/status_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/operator/credentialsrequest/credentialsrequest_controller_azure_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_label_test.go

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@dlom: This pull request references CCO-738 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

This pull request references CCO-739 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

xref: CCO-738
xref: CCO-739
/assign @jstuever

the Pod Identity Webhook controller will now check all pods owned by its deployment for their image version, and indicate progressing if the number of pods doesn't match the desired replicas, or if any of the pods have an older image

Summary by CodeRabbit

  • New Features

  • Pod identity webhook deployment status tracking with automatic detection of availability and rollout progress.

  • Improvements

  • Broader context propagation across status and reconciliation flows to support proper cancellation/timeouts.

  • Status progression explicitly set when operator versions change.

  • Tests

  • Expanded and reorganized tests, adding coverage for webhook status checks and converting many tests to contextualized subtests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/operator/status/status_controller.go (1)

256-265: ⚠️ Potential issue | 🟡 Minor

Progressing condition may have inconsistent Reason/Message when version changes.

When the version changes, progressing.Status is set to ConditionTrue, but the Reason and Message fields are not updated. If no handler explicitly set Progressing, this results in Status=True with Reason="AsExpected" and Message="All is well", which is semantically inconsistent.

Consider setting an appropriate Reason and Message when forcing Progressing to True due to a version change.

🔧 Proposed fix
 	if !reflect.DeepEqual(oldVersions, co.Status.Versions) {
 		logger.WithFields(log.Fields{
 			"old": oldVersions,
 			"new": co.Status.Versions,
 		}).Info("version has changed, updating progressing condition lastTransitionTime")
 		progressing, _ := findClusterOperatorCondition(co.Status.Conditions, configv1.OperatorProgressing)
 		// We know this should be there.
 		progressing.LastTransitionTime = metav1.Now()
 		progressing.Status = configv1.ConditionTrue
+		progressing.Reason = "VersionChanged"
+		progressing.Message = "Operator version is updating"
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/status/status_controller.go` around lines 256 - 265, When
detecting a version change (comparing oldVersions and co.Status.Versions) the
code updates progressing.LastTransitionTime and progressing.Status but does not
update progressing.Reason and progressing.Message, causing an inconsistent
Progressing=True with stale "AsExpected"/"All is well" text; modify the block
that uses findClusterOperatorCondition (the progressing variable) to also set
progressing.Reason and progressing.Message to reflect the forced progress state
(e.g. Reason="VersionChange" and a short Message like "Cluster operator is
progressing due to version update"), ensuring you still call metav1.Now() and
set progressing.Status = configv1.ConditionTrue and that you handle the returned
progressing pointer from findClusterOperatorCondition safely.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/podidentity/podidentitywebhook_controller.go`:
- Around line 429-434: GetConditions currently aborts on CheckPodStatus errors
which causes Reconcile-cached conditions (e.g.,
reasonStaticResourceReconcileFailed stored in r.conditions) to be dropped;
instead, when CheckPodStatus(ctx) returns an error, log it but do not return
early — merge or return the existing r.conditions so syncStatus can see the
degraded state; in other words, in GetConditions keep using r.conditions as the
base, only augment with podStatus when CheckPodStatus succeeds, and ensure the
function returns the conditions (and nil error) rather than nil,error when pod
status lookup fails.
- Around line 389-424: The Progressing check is too loose; update
webhookPodStatus and its usage so rollout isn't marked complete while old surge
pods still exist: add a field to webhookPodStatus (e.g., totalReplicas or
replicas) and populate it in CheckPodStatus from deployment.Status.Replicas,
then change Progressing() to return true unless both updatedReplicas ==
desiredReplicas AND totalReplicas == desiredReplicas (i.e., Progressing returns
!(wps.updatedReplicas == wps.desiredReplicas && wps.replicas ==
wps.desiredReplicas)), updating references to webhookPodStatus, Progressing(),
and CheckPodStatus accordingly.
- Around line 436-437: The copy of r.conditions in GetConditions() protects
against slice aliasing but not concurrent header reads/writes; add a mutex
(e.g., sync.RWMutex) to the reconciler struct (the type that owns r.conditions,
e.g., PodIdentityWebhookReconciler), use r.mu.RLock()/RUnlock() around the
read+copy in GetConditions(), and use r.mu.Lock()/Unlock() around any writes or
full replacements of r.conditions in Reconcile() (and any other methods that
mutate it) so the slice header cannot be concurrently read while being replaced.

---

Outside diff comments:
In `@pkg/operator/status/status_controller.go`:
- Around line 256-265: When detecting a version change (comparing oldVersions
and co.Status.Versions) the code updates progressing.LastTransitionTime and
progressing.Status but does not update progressing.Reason and
progressing.Message, causing an inconsistent Progressing=True with stale
"AsExpected"/"All is well" text; modify the block that uses
findClusterOperatorCondition (the progressing variable) to also set
progressing.Reason and progressing.Message to reflect the forced progress state
(e.g. Reason="VersionChange" and a short Message like "Cluster operator is
progressing due to version update"), ensuring you still call metav1.Now() and
set progressing.Status = configv1.ConditionTrue and that you handle the returned
progressing pointer from findClusterOperatorCondition safely.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a2750233-7005-4a95-a212-07fa73f48031

📥 Commits

Reviewing files that changed from the base of the PR and between 7c3a20b and 3154ec0.

📒 Files selected for processing (14)
  • pkg/operator/cleanup/status.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_azure_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_gcp_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_label_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_vsphere_test.go
  • pkg/operator/credentialsrequest/status.go
  • pkg/operator/loglevel/status.go
  • pkg/operator/podidentity/podidentitywebhook_controller.go
  • pkg/operator/podidentity/podidentitywebhook_controller_test.go
  • pkg/operator/secretannotator/status/status.go
  • pkg/operator/status/status_controller.go
  • pkg/operator/status/status_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/operator/credentialsrequest/credentialsrequest_controller.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_gcp_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_azure_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_vsphere_test.go

The Pod Identity Webhook controller will now report Available: False
when there are no pods available on the deployment, and Progressing:
True when not all pods match the latest deployment spec (motivating
factor: when the image on the deployment spec has been updated)
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 9, 2026

@dlom: This pull request references CCO-738 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

This pull request references CCO-739 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

xref: CCO-738
xref: CCO-739
/assign @jstuever

the Pod Identity Webhook controller will now check all pods owned by its deployment for their image version, and indicate progressing if the number of pods doesn't match the desired replicas, or if any of the pods have an older image

Summary by CodeRabbit

  • New Features

  • Pod identity webhook deployment status tracking with automatic detection of availability and rollout progress.

  • Improvements

  • Wider context propagation across status and reconciliation flows for better cancellation/timeouts and more robust client operations.

  • Status progression explicitly set when operator versions change.

  • Added synchronization around condition handling to improve reliability during concurrent reconciliations.

  • Tests

  • Expanded and reorganized tests, adding coverage for webhook status checks and contextualized subtests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/operator/podidentity/podidentitywebhook_controller_test.go (1)

679-696: Consider using int32 parameters directly.

The helper function accepts int parameters and casts them to int32. While this works for typical test values, using int32 parameters directly would be more type-safe and align with the Kubernetes API types.

Suggested improvement
-func testDeploymentWithReplicas(desired, available, updated, total int) *appsv1.Deployment {
+func testDeploymentWithReplicas(desired, available, updated, total int32) *appsv1.Deployment {
 	int32Ptr := func(i int32) *int32 { return &i }
 
 	return &appsv1.Deployment{
 		ObjectMeta: v1.ObjectMeta{
 			Name:      "pod-identity-webhook",
 			Namespace: "openshift-cloud-credential-operator",
 		},
 		Spec: appsv1.DeploymentSpec{
-			Replicas: int32Ptr(int32(desired)),
+			Replicas: int32Ptr(desired),
 		},
 		Status: appsv1.DeploymentStatus{
-			AvailableReplicas: int32(available),
-			UpdatedReplicas:   int32(updated),
-			Replicas:          int32(total),
+			AvailableReplicas: available,
+			UpdatedReplicas:   updated,
+			Replicas:          total,
 		},
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/podidentity/podidentitywebhook_controller_test.go` around lines
679 - 696, The helper testDeploymentWithReplicas currently takes int parameters
and casts them to int32; change its signature to accept int32 parameters
(desired, available, updated, total int32) and update the int32Ptr helper to
accept an int32 directly, remove unnecessary casts inside Spec/Status, and
update any test call sites that pass ints (e.g., convert literals to int32 or
add explicit int32() where the helper is invoked) so types align with Kubernetes
API types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/operator/podidentity/podidentitywebhook_controller_test.go`:
- Around line 679-696: The helper testDeploymentWithReplicas currently takes int
parameters and casts them to int32; change its signature to accept int32
parameters (desired, available, updated, total int32) and update the int32Ptr
helper to accept an int32 directly, remove unnecessary casts inside Spec/Status,
and update any test call sites that pass ints (e.g., convert literals to int32
or add explicit int32() where the helper is invoked) so types align with
Kubernetes API types.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bce23ab9-feca-4c7e-998b-a1404e2a4bcc

📥 Commits

Reviewing files that changed from the base of the PR and between 3154ec0 and 2cc41a6.

📒 Files selected for processing (14)
  • pkg/operator/cleanup/status.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_azure_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_gcp_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_label_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_vsphere_test.go
  • pkg/operator/credentialsrequest/status.go
  • pkg/operator/loglevel/status.go
  • pkg/operator/podidentity/podidentitywebhook_controller.go
  • pkg/operator/podidentity/podidentitywebhook_controller_test.go
  • pkg/operator/secretannotator/status/status.go
  • pkg/operator/status/status_controller.go
  • pkg/operator/status/status_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/operator/credentialsrequest/credentialsrequest_controller_label_test.go
  • pkg/operator/credentialsrequest/credentialsrequest_controller_gcp_test.go
  • pkg/operator/cleanup/status.go

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 10, 2026

@dlom: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift 2cc41a6 link true /test e2e-hypershift
ci/prow/e2e-upgrade 2cc41a6 link true /test e2e-upgrade
ci/prow/e2e-aws-qe 2cc41a6 link false /test e2e-aws-qe
ci/prow/e2e-aws-ovn 2cc41a6 link true /test e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants