Skip to content

CORENET-6714: Enable Network Observability on Day 0#2925

Open
OlivierCazade wants to merge 9 commits intoopenshift:masterfrom
OlivierCazade:day0
Open

CORENET-6714: Enable Network Observability on Day 0#2925
OlivierCazade wants to merge 9 commits intoopenshift:masterfrom
OlivierCazade:day0

Conversation

@OlivierCazade
Copy link

@OlivierCazade OlivierCazade commented Mar 10, 2026

Add Network Observability Day-0 Installation Support

The replace instruction in the go.mod file need to be removed before merging this branch, this can be removed once the PR on the API repo is merged

This PR introduces automatic Day-0 installation of the Network Observability operator, controlled by a new installNetworkObservability field in the Network CR.

Summary

The Cluster Network Operator (CNO) now handles initial deployment of the Network Observability operator and FlowCollector, enabling network flow monitoring out-of-the-box for OpenShift clusters while respecting user preferences and cluster topology.

Key Features

  1. Opt-out Model with SNO Detection

    • Network Observability is installed by default on multi-node clusters
    • Single Node OpenShift (SNO) clusters are excluded by default to conserve resources
    • Users can explicitly enable/disable via spec.installNetworkObservability
  2. Deployment Tracking

    • After successful deployment, the NetworkObservabilityDeployed condition is set
    • All subsequent reconciliations immediately return when this condition is present
    • This prevents reinstallation if users delete the operator or FlowCollector
    • Users maintain full control after initial deployment
  3. Status Reporting

    • New ObservabilityConfig status level for degraded state reporting
    • Comprehensive error handling with clear degraded status messages
    • Status cleared when Network Observability is running or disabled

Configuration

The spec.installNetworkObservability field accepts three values:

  • "" (empty/nil): Default behavior - install on multi-node clusters, skip on SNO
  • "Enable": Explicitly enable installation, even on SNO clusters
  • "Disable": Explicitly disable installation

RBAC Changes

Added permissions for the CNO to manage:

  • Subscriptions and ClusterServiceVersions (operators.coreos.com)
  • FlowCollectors (flows.netobserv.io)
  • Namespaces for operator deployment

Related

/cc @stleerh

stleerh and others added 5 commits March 10, 2026 12:12
…pport

Implements a new controller to automatically install and manage the Network
Observability Operator via OLM. The controller handles the complete lifecycle
including operator installation, readiness checking, and FlowCollector creation.

Key features:
- Opt-out installation model: Network Observability is installed by default
  unless explicitly disabled via spec.installNetworkObservability
- SNO (Single Node OpenShift) detection: Automatically skips installation on
  SNO clusters unless explicitly enabled to reduce resource consumption
- Comprehensive status reporting: Sets degraded status with detailed error
  messages for all failure scenarios (operator installation, readiness
  timeouts, FlowCollector creation)
- Idempotent reconciliation: Safely handles multiple invocations and
  concurrent reconciliations

Implementation details:
- Added shouldInstallNetworkObservability() function with SNO topology check
  via Infrastructure.Status.ControlPlaneTopology
- Created StatusReporter interface for testability and status management
- Added ObservabilityConfig status level to StatusManager
- Updated RBAC to allow management of OLM resources (Subscriptions,
  ClusterServiceVersions, OperatorGroups)
- Renamed observabilityEnabled to installNetworkObservability in Network spec
  for clarity and consistency with API conventions

Testing:
- 43 comprehensive unit tests covering all scenarios
- 80.3% code coverage including error paths
- Tests for SNO detection, status updates, and reconciliation flows
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 10, 2026
@openshift-ci openshift-ci bot requested a review from stleerh March 10, 2026 14:08
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@OlivierCazade: This pull request references CORENET-6714 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add Network Observability Day-0 Installation Support

The replace instruction in the go.mod file need to be removed before merging this branch, this can be removed once the PR on the API repo is merged

This PR introduces automatic Day-0 installation of the Network Observability operator, controlled by a new installNetworkObservability field in the Network CR.

Summary

The Cluster Network Operator (CNO) now handles initial deployment of the Network Observability operator and FlowCollector, enabling network flow monitoring out-of-the-box for OpenShift clusters while respecting user preferences and cluster topology.

Key Features

  1. Opt-out Model with SNO Detection
  • Network Observability is installed by default on multi-node clusters
  • Single Node OpenShift (SNO) clusters are excluded by default to conserve resources
  • Users can explicitly enable/disable via spec.installNetworkObservability
  1. Deployment Tracking
  • After successful deployment, the NetworkObservabilityDeployed condition is set
  • All subsequent reconciliations immediately return when this condition is present
  • This prevents reinstallation if users delete the operator or FlowCollector
  • Users maintain full control after initial deployment
  1. Status Reporting
  • New ObservabilityConfig status level for degraded state reporting
  • Comprehensive error handling with clear degraded status messages
  • Status cleared when Network Observability is running or disabled

Configuration

The spec.installNetworkObservability field accepts three values:

  • "" (empty/nil): Default behavior - install on multi-node clusters, skip on SNO
  • "Enable": Explicitly enable installation, even on SNO clusters
  • "Disable": Explicitly disable installation

RBAC Changes

Added permissions for the CNO to manage:

  • Subscriptions and ClusterServiceVersions (operators.coreos.com)
  • FlowCollectors (flows.netobserv.io)
  • Namespaces for operator deployment

Related

/cc @stleerh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a Network Observability controller and tests, RBAC ClusterRole/Binding, OLM Operator and FlowCollector manifests, a new StatusLevel constant, two go.mod replace directives, and a sample-config field for networkObservability.installationPolicy.

Changes

Cohort / File(s) Summary
Controller implementation & registration
pkg/controller/observability/observability_controller.go, pkg/controller/add_networkconfig.go
New Observability controller (Add, Reconcile) that installs the NetObserv operator and FlowCollector via server-side-applied manifests, manages Network status conditions, feature-gate and topology gating, and registers the controller with the manager.
Controller tests
pkg/controller/observability/observability_controller_test.go
Extensive unit test suite with helpers, mock status manager, and tests covering install decision logic, SNO vs HA topology, operator/CSV readiness polling, manifest apply behaviors, FlowCollector lifecycle, idempotency, concurrency, and status transitions.
RBAC
manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml
Adds ClusterRole cno-observability and ClusterRoleBinding binding the cluster-network-operator ServiceAccount; grants permissions for namespace management, OLM resources (Subscriptions/CSVs/OperatorGroups), and FlowCollector CRs.
Operator & FlowCollector manifests
manifests/07-observability-operator.yaml, manifests/08-flowcollector.yaml
Adds OperatorGroup and Subscription for netobserv-operator (OLM installation) and a cluster-scoped FlowCollector CR (eBPF agent, Direct deployment, namespace netobserv).
Status tracking
pkg/controller/statusmanager/status_manager.go
Inserted new ObservabilityConfig member into the StatusLevel const block.
Module resolution & sample config
go.mod, sample-config.yaml
Added two replace directives in go.mod remapping a specific github.com/openshift/api version to an alternate fork/version; added spec.networkObservability.installationPolicy field to sample-config.yaml.

Sequence Diagram

sequenceDiagram
    participant Controller as Observability Controller
    participant K8sAPI as Kubernetes API
    participant Operator as NetObserv Operator
    participant FlowCollector as FlowCollector Resource
    
    Controller->>K8sAPI: Reconcile Network CR (cluster)
    activate Controller
    
    Controller->>Controller: Check if should install<br/>(based on spec & topology)
    
    alt Should Install
        Controller->>K8sAPI: Create namespaces<br/>(openshift-netobserv-operator, netobserv)
        
        Controller->>K8sAPI: Apply operator manifest<br/>(OperatorGroup, Subscription)
        activate Operator
        
        loop Poll for Readiness
            Controller->>K8sAPI: Check ClusterServiceVersion status
            K8sAPI-->>Controller: CSV status
            Note over Controller: Wait for Succeeded state
        end
        Operator-->>Controller: Operator ready
        deactivate Operator
        
        Controller->>K8sAPI: Check if FlowCollector exists
        
        alt FlowCollector Missing
            Controller->>K8sAPI: Apply FlowCollector manifest
            K8sAPI->>FlowCollector: Create resource
        end
        
        Controller->>K8sAPI: Update Network status<br/>(NetworkObservabilityDeployed condition)
    end
    
    deactivate Controller
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test file has critical quality issues: 128 Expect assertions lack meaningful failure messages, concurrent reconciliation test has race condition with Gomega assertions in goroutines, and 55 context operations use context.TODO() instead of proper scoping. Add failure messages to all Expect calls, refactor concurrent test to collect errors in channel and assert in main goroutine, replace context.TODO() with properly scoped contexts (Background or WithTimeout).
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main objective: enabling Network Observability installation on Day 0, which is the primary feature introduced by this changeset.
Stable And Deterministic Test Names ✅ Passed Test file uses standard Go testing framework with static, descriptive test names; Ginkgo check not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.38.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope

... [truncated 17367 characters] ...

quired in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.34.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.34.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/openshift/api@v0.0.0-20260116192047-6fb7fdae95fd: is replaced in go.mod, but not marked as replaced in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 10, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: OlivierCazade
Once this PR has been reviewed and has the lgtm label, please assign kyrtapz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@OlivierCazade: This pull request references CORENET-6714 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add Network Observability Day-0 Installation Support

The replace instruction in the go.mod file need to be removed before merging this branch, this can be removed once the PR on the API repo is merged

This PR introduces automatic Day-0 installation of the Network Observability operator, controlled by a new installNetworkObservability field in the Network CR.

Summary

The Cluster Network Operator (CNO) now handles initial deployment of the Network Observability operator and FlowCollector, enabling network flow monitoring out-of-the-box for OpenShift clusters while respecting user preferences and cluster topology.

Key Features

  1. Opt-out Model with SNO Detection
  • Network Observability is installed by default on multi-node clusters
  • Single Node OpenShift (SNO) clusters are excluded by default to conserve resources
  • Users can explicitly enable/disable via spec.installNetworkObservability
  1. Deployment Tracking
  • After successful deployment, the NetworkObservabilityDeployed condition is set
  • All subsequent reconciliations immediately return when this condition is present
  • This prevents reinstallation if users delete the operator or FlowCollector
  • Users maintain full control after initial deployment
  1. Status Reporting
  • New ObservabilityConfig status level for degraded state reporting
  • Comprehensive error handling with clear degraded status messages
  • Status cleared when Network Observability is running or disabled

Configuration

The spec.installNetworkObservability field accepts three values:

  • "" (empty/nil): Default behavior - install on multi-node clusters, skip on SNO
  • "Enable": Explicitly enable installation, even on SNO clusters
  • "Disable": Explicitly disable installation

RBAC Changes

Added permissions for the CNO to manage:

  • Subscriptions and ClusterServiceVersions (operators.coreos.com)
  • FlowCollectors (flows.netobserv.io)
  • Namespaces for operator deployment

Related

/cc @stleerh

Summary by CodeRabbit

  • New Features
  • Network Observability is now installable and manageable through cluster configuration—users can enable or disable network observability monitoring for their cluster.
  • Automatic deployment and lifecycle management of the observability operator and flow collection resources.
  • Network configuration status now reflects observability deployment state and health.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml (1)

11-14: Consider adding watch and delete verbs for OLM resources.

The controller may need watch to properly track subscription and CSV state changes via informers. Additionally, delete permission on subscriptions might be needed if you ever want to support uninstallation or cleanup scenarios.

🔧 Suggested addition
   # Manage OLM resources for operator installation
   - apiGroups: ["operators.coreos.com"]
     resources: ["subscriptions", "clusterserviceversions", "operatorgroups"]
-    verbs: ["get", "list", "create", "update", "patch"]
+    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml` around
lines 11 - 14, The RBAC rule for apiGroups "operators.coreos.com" covering
resources ["subscriptions", "clusterserviceversions", "operatorgroups"] is
missing the "watch" and "delete" verbs; update the verbs array for that rule
(for the resources "subscriptions", "clusterserviceversions", and
"operatorgroups") to include "watch" (so informers can track state changes) and
"delete" (to allow cleanup/uninstallation) in addition to the existing verbs.
pkg/controller/observability/observability_controller_test.go (1)

1512-1518: Test creates files in working directory - may cause side effects.

This test creates a manifests/ directory and writes FlowCollectorYAML in the working directory. While defer os.Remove(FlowCollectorYAML) removes the file, it doesn't remove the manifests directory, which could persist across test runs.

🔧 Suggested fix for proper cleanup
 	err := os.MkdirAll("manifests", 0755)
 	g.Expect(err).NotTo(HaveOccurred())
+	defer os.RemoveAll("manifests")  // Clean up directory

 	// Create the FlowCollector manifest at the expected path
 	err = os.WriteFile(FlowCollectorYAML, []byte(flowCollectorManifest), 0644)
 	g.Expect(err).NotTo(HaveOccurred())
-	defer os.Remove(FlowCollectorYAML)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1512 - 1518, The test creates a "manifests" directory and a file at
FlowCollectorYAML using os.MkdirAll and os.WriteFile but only defers removal of
the file, leaving the manifests directory behind; update the test to always
clean up the directory by deferring a call to remove the directory (e.g. defer
os.RemoveAll("manifests")) immediately after os.MkdirAll succeeds (and keep or
replace the existing defer os.Remove(FlowCollectorYAML)), so that both the file
created from flowCollectorManifest and the manifests directory are removed after
the test.
pkg/controller/observability/observability_controller.go (2)

356-392: Potential race condition in markNetworkObservabilityDeployed.

The function has a TOCTOU (time-of-check-time-of-use) pattern: it reads the latest Network CR, modifies conditions, then updates. If another controller or user modifies the Network CR between Get and Update, the update will fail with a conflict error (which would be retried by the caller).

This is acceptable since Kubernetes retries on conflicts, but consider using a retry loop here for better resilience.

🔧 Suggested: Add retry for conflict resilience
 func (r *ReconcileObservability) markNetworkObservabilityDeployed(ctx context.Context, network *configv1.Network) error {
+	return retry.RetryOnConflict(retry.DefaultBackoff, func() error {
+		return r.doMarkNetworkObservabilityDeployed(ctx)
+	})
+}
+
+func (r *ReconcileObservability) doMarkNetworkObservabilityDeployed(ctx context.Context) error {
 	// Check if condition already exists and is true
-	for _, condition := range network.Status.Conditions {
+	latest := &configv1.Network{}
+	if err := r.client.Get(ctx, types.NamespacedName{Name: "cluster"}, latest); err != nil {
+		return err
+	}
+
+	for _, condition := range latest.Status.Conditions {
 		if condition.Type == NetworkObservabilityDeployed && condition.Status == metav1.ConditionTrue {
 			return nil // Already marked as deployed
 		}
 	}
-
-	// Get the latest version of the Network CR to avoid conflicts
-	latest := &configv1.Network{}
-	if err := r.client.Get(ctx, types.NamespacedName{Name: "cluster"}, latest); err != nil {
-		return err
-	}
 	// ... rest of the function
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 356 -
392, markNetworkObservabilityDeployed currently does a Get -> modify ->
Status().Update which can fail on a concurrent update; wrap the
Get/modify/Status().Update sequence in a retry loop using
k8s.io/apimachinery/pkg/util/wait.RetryOnConflict (or retry.RetryOnConflict) to
retry on conflicts, re-fetching the latest Network (using r.client.Get) inside
the loop before applying the condition and calling r.client.Status().Update so
transient conflicts are retried safely; keep the same condition logic and return
the final error from the retry.

288-319: Polling uses parent context which may already have timeout pressure.

waitForNetObservOperator uses the passed context for polling, but also defines its own checkTimeout (10 minutes). If the caller's context has a shorter deadline, the poll will exit early with the caller's context error rather than context.DeadlineExceeded from wait.PollUntilContextTimeout.

The current implementation may work correctly in practice since the controller-runtime context typically doesn't have a deadline, but this could cause unexpected behavior in tests or if the calling context changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 288 -
319, The polling uses the caller's ctx which can have a shorter deadline;
instead create a new timeout-bound context for the poll so it always runs up to
checkTimeout: inside waitForNetObservOperator, call
context.WithTimeout(context.Background(), checkTimeout) (defer cancel) and pass
that new ctx to wait.PollUntilContextTimeout (keep checkInterval, checkTimeout,
condition as before); reference function waitForNetObservOperator and variables
checkInterval/checkTimeout to locate where to replace the ctx usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 166: The go.mod contains a temporary replace directive pointing to a
personal fork for openshift/api#2752; do not merge until the upstream PR is
merged. Add a CI/merge-block gate that fails or flags the PR when the replace
directive "replace github.com/openshift/api" is present, add a clear TODO
comment adjacent to the replace line referencing openshift/api#2752 and stating
it must be removed when that PR merges, and add repository-level tracking (e.g.,
in the PR description or a maintainer checklist) to actively monitor
openshift/api#2752 and remove the replace directive immediately once the
upstream PR is approved and merged.

In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1250-1265: The test spawns goroutines that call r.Reconcile and
uses g.Expect inside those goroutines, which can panic because Gomega's fail
handler (t.FailNow) must run in the main test goroutine; fix by removing
g.Expect from the goroutines and collecting errors via a channel (e.g., make
errCh chan error), have each goroutine send its err to errCh after calling
r.Reconcile(req), then in the main goroutine close/iterate the channel and
assert with g.Expect that all received errors are nil; alternatively create a
per-goroutine Gomega tied to a testing.T (NewWithT) if you need per-goroutine
assertions—reference r.Reconcile, req, and the done/err channel approach to
locate where to change code.

In `@pkg/controller/observability/observability_controller.go`:
- Around line 169-173: The call to markNetworkObservabilityDeployed(err) is only
logged on failure which prevents the controller from retrying and can lead to
repeated FlowCollector creation; modify the reconciler to return the error
instead of just logging it so the reconciliation is requeued—locate the call to
r.markNetworkObservabilityDeployed(ctx, &network) in the reconciler function
(the surrounding code that currently does klog.Warningf on error) and replace
the log-without-return with returning a wrapped/annotated error (or simply
return err) so the controller will retry the reconcile loop; keep any logging
but ensure the function exits with an error when
markNetworkObservabilityDeployed fails.
- Around line 144-152: When waitForNetObservOperator(ctx) times out you
currently return ctrl.Result{RequeueAfter: 0}, nil which prevents any automatic
retry; update the timeout branch in the reconciler to either return a non-zero
RequeueAfter (e.g., RequeueAfter: time.Minute*5) so the controller will re-check
later, or return a wrapped error instead of nil to trigger error-based
requeueing; adjust the block around waitForNetObservOperator, the
r.status.SetDegraded call, and the return statement so the controller will retry
(refer to waitForNetObservOperator,
r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady",
...), and the existing return ctrl.Result{RequeueAfter: 0}, nil).

---

Nitpick comments:
In `@manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml`:
- Around line 11-14: The RBAC rule for apiGroups "operators.coreos.com" covering
resources ["subscriptions", "clusterserviceversions", "operatorgroups"] is
missing the "watch" and "delete" verbs; update the verbs array for that rule
(for the resources "subscriptions", "clusterserviceversions", and
"operatorgroups") to include "watch" (so informers can track state changes) and
"delete" (to allow cleanup/uninstallation) in addition to the existing verbs.

In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1512-1518: The test creates a "manifests" directory and a file at
FlowCollectorYAML using os.MkdirAll and os.WriteFile but only defers removal of
the file, leaving the manifests directory behind; update the test to always
clean up the directory by deferring a call to remove the directory (e.g. defer
os.RemoveAll("manifests")) immediately after os.MkdirAll succeeds (and keep or
replace the existing defer os.Remove(FlowCollectorYAML)), so that both the file
created from flowCollectorManifest and the manifests directory are removed after
the test.

In `@pkg/controller/observability/observability_controller.go`:
- Around line 356-392: markNetworkObservabilityDeployed currently does a Get ->
modify -> Status().Update which can fail on a concurrent update; wrap the
Get/modify/Status().Update sequence in a retry loop using
k8s.io/apimachinery/pkg/util/wait.RetryOnConflict (or retry.RetryOnConflict) to
retry on conflicts, re-fetching the latest Network (using r.client.Get) inside
the loop before applying the condition and calling r.client.Status().Update so
transient conflicts are retried safely; keep the same condition logic and return
the final error from the retry.
- Around line 288-319: The polling uses the caller's ctx which can have a
shorter deadline; instead create a new timeout-bound context for the poll so it
always runs up to checkTimeout: inside waitForNetObservOperator, call
context.WithTimeout(context.Background(), checkTimeout) (defer cancel) and pass
that new ctx to wait.PollUntilContextTimeout (keep checkInterval, checkTimeout,
condition as before); reference function waitForNetObservOperator and variables
checkInterval/checkTimeout to locate where to replace the ctx usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 826708e4-9aa4-4a6f-941f-be6e5a7c5d63

📥 Commits

Reviewing files that changed from the base of the PR and between 88891a9 and b1eeed3.

⛔ Files ignored due to path filters (36)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/config/v1/types_network.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gitignore is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.golangci.yml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/.gomodcheck.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/CONTRIBUTING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/FAQ.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/Makefile is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/OWNERS_ALIASES is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/README.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/RELEASE.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/SECURITY_CONTACTS is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/TMP-LOGGING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/VERSIONING.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/alias.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/code-of-conduct.md is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/controller.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/options.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/builder/webhook.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/client/config/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/conversion/conversion.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/doc.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_posix.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/manager/signals/signal_windows.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/scheme/scheme.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/webhook/conversion/conversion.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/webhook/conversion/decoder.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/controller-runtime/pkg/webhook/conversion/metrics/metrics.go is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (9)
  • go.mod
  • manifests/0000_70_cluster-network-operator_02_rbac_observability.yaml
  • manifests/07-observability-operator.yaml
  • manifests/08-flowcollector.yaml
  • pkg/controller/add_networkconfig.go
  • pkg/controller/observability/observability_controller.go
  • pkg/controller/observability/observability_controller_test.go
  • pkg/controller/statusmanager/status_manager.go
  • sample-config.yaml

go.mod Outdated
sigs.k8s.io/structured-merge-diff/v6 v6.3.0 // indirect
)

replace github.com/openshift/api v0.0.0-20260116192047-6fb7fdae95fd => github.com/OlivierCazade/api v0.0.0-20260310123814-0d1d1592146d
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify the fork repository and commit hashes are valid

echo "Checking if fork repository exists..."
gh api repos/OlivierCazade/api --jq '.full_name' 2>/dev/null || echo "Fork repository not found or not accessible"

echo -e "\nChecking if fork commit exists..."
gh api repos/OlivierCazade/api/commits/0d1d1592146d --jq '.sha' 2>/dev/null || echo "Fork commit not found"

echo -e "\nChecking if original commit exists in upstream..."
gh api repos/openshift/api/commits/6fb7fdae95fd --jq '.sha' 2>/dev/null || echo "Original commit not found"

echo -e "\nChecking status of related API PR..."
gh pr view 2752 --repo openshift/api --json state,title,url --jq '{state,title,url}' 2>/dev/null || echo "API PR `#2752` not found or not accessible"

Repository: openshift/cluster-network-operator

Length of output: 708


🏁 Script executed:

#!/bin/bash
# Try public GitHub API access without authentication
echo "Checking fork repository (public API)..."
curl -s "https://api.github.com/repos/OlivierCazade/api" | jq -r '.full_name // .message' 2>/dev/null || echo "Unable to access fork repository"

echo -e "\nChecking upstream openshift/api repository..."
curl -s "https://api.github.com/repos/openshift/api" | jq -r '.full_name // .message' 2>/dev/null || echo "Unable to access upstream repository"

Repository: openshift/cluster-network-operator

Length of output: 497


🌐 Web query:

openshift/api#2752 pull request status 2026

💡 Result:

As of March 10, 2026, openshift/api#2752 (“CORENET-6714: Enable Network Observability on Day 0”) is OPEN (not merged/closed). [1]

Current notable status items:

  • Not approved yet (openshift-ci bot: “This PR is NOT APPROVED”; needs /approve from an approver). [2]
  • Awaiting requested review from @stleerh. [1]
  • Automated pre-merge checks reported 4 passed / 1 failed (inconclusive), with the inconclusive failure being a “Description check” mismatch vs the actual implementation details. [2]

Sources: [1], [2]


🌐 Web query:

OlivierCazade/api github fork openshift

💡 Result:

OlivierCazade/api is a GitHub repository that is a fork of openshift/api (the canonical OpenShift API type definitions / serialization code). [1][2]

Repo (fork): https://github.com/OlivierCazade/api [1]
Upstream: https://github.com/openshift/api [2]


Do not merge until openshift/api#2752 is merged; establish blocking gate.

This temporary replace directive points to a personal fork to test unmerged API changes from openshift/api#2752. The fork repository exists and is legitimate; however, as of March 10, 2026, the upstream PR is OPEN and NOT APPROVED, meaning the replace directive could remain in production code indefinitely if this PR merges first.

Required actions:

  1. Add a CI check or merge-blocking comment to prevent merge while the replace directive is present
  2. Add a TODO comment referencing CORENET-6714: Enable Network Observability on Day 0 api#2752 to make removal explicit
  3. Track the upstream PR status and remove this directive immediately when CORENET-6714: Enable Network Observability on Day 0 api#2752 merges
  4. Do not merge this PR unless CORENET-6714: Enable Network Observability on Day 0 api#2752 is approved and merged, or the replace directive is removed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 166, The go.mod contains a temporary replace directive
pointing to a personal fork for openshift/api#2752; do not merge until the
upstream PR is merged. Add a CI/merge-block gate that fails or flags the PR when
the replace directive "replace github.com/openshift/api" is present, add a clear
TODO comment adjacent to the replace line referencing openshift/api#2752 and
stating it must be removed when that PR merges, and add repository-level
tracking (e.g., in the PR description or a maintainer checklist) to actively
monitor openshift/api#2752 and remove the replace directive immediately once the
upstream PR is approved and merged.

Comment on lines +1250 to +1265
// Run 5 concurrent reconciliations
done := make(chan bool, 5)
for i := 0; i < 5; i++ {
go func() {
_, err := r.Reconcile(context.TODO(), req)
// All should complete without error (idempotent)
g.Expect(err).NotTo(HaveOccurred())
done <- true
}()
}

// Wait for all to complete
for i := 0; i < 5; i++ {
<-done
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential race condition in concurrent reconciliation test.

Calling g.Expect() inside goroutines can cause test panics if an assertion fails, because Gomega's default fail handler calls t.FailNow() which cannot be called from a goroutine other than the one running the test. Additionally, errors are not properly collected across goroutines.

🔧 Suggested fix using error channels
 	// Run 5 concurrent reconciliations
 	done := make(chan bool, 5)
+	errChan := make(chan error, 5)
 	for i := 0; i < 5; i++ {
 		go func() {
 			_, err := r.Reconcile(context.TODO(), req)
-			// All should complete without error (idempotent)
-			g.Expect(err).NotTo(HaveOccurred())
+			errChan <- err
 			done <- true
 		}()
 	}

 	// Wait for all to complete
 	for i := 0; i < 5; i++ {
 		<-done
+		err := <-errChan
+		g.Expect(err).NotTo(HaveOccurred())
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1250 - 1265, The test spawns goroutines that call r.Reconcile and uses g.Expect
inside those goroutines, which can panic because Gomega's fail handler
(t.FailNow) must run in the main test goroutine; fix by removing g.Expect from
the goroutines and collecting errors via a channel (e.g., make errCh chan
error), have each goroutine send its err to errCh after calling
r.Reconcile(req), then in the main goroutine close/iterate the channel and
assert with g.Expect that all received errors are nil; alternatively create a
per-goroutine Gomega tied to a testing.T (NewWithT) if you need per-goroutine
assertions—reference r.Reconcile, req, and the done/err channel approach to
locate where to change code.

Comment on lines +144 to +152
if err := r.waitForNetObservOperator(ctx); err != nil {
if err == context.DeadlineExceeded {
klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout)
r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout))
return ctrl.Result{RequeueAfter: 0}, nil // Don't requeue
}
r.status.SetDegraded(statusmanager.ObservabilityConfig, "WaitOperatorError", fmt.Sprintf("Failed waiting for Network Observability Operator: %v", err))
return ctrl.Result{}, err
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Timeout handling stops reconciliation without requeue.

When waitForNetObservOperator times out (after 10 minutes), the controller returns RequeueAfter: 0 without an error, which means no automatic retry will occur. The operator installation may eventually succeed, but the controller won't notice and won't create the FlowCollector.

Consider returning a non-zero RequeueAfter to check again later, or returning an error to trigger retry via the error-based requeue mechanism.

🔧 Suggested fix to enable retry
 	if err := r.waitForNetObservOperator(ctx); err != nil {
 		if err == context.DeadlineExceeded {
 			klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout)
 			r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout))
-			return ctrl.Result{RequeueAfter: 0}, nil // Don't requeue
+			return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil // Retry later to check if operator eventually succeeds
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := r.waitForNetObservOperator(ctx); err != nil {
if err == context.DeadlineExceeded {
klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout)
r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout))
return ctrl.Result{RequeueAfter: 0}, nil // Don't requeue
}
r.status.SetDegraded(statusmanager.ObservabilityConfig, "WaitOperatorError", fmt.Sprintf("Failed waiting for Network Observability Operator: %v", err))
return ctrl.Result{}, err
}
if err := r.waitForNetObservOperator(ctx); err != nil {
if err == context.DeadlineExceeded {
klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout)
r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout))
return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil // Retry later to check if operator eventually succeeds
}
r.status.SetDegraded(statusmanager.ObservabilityConfig, "WaitOperatorError", fmt.Sprintf("Failed waiting for Network Observability Operator: %v", err))
return ctrl.Result{}, err
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 144 -
152, When waitForNetObservOperator(ctx) times out you currently return
ctrl.Result{RequeueAfter: 0}, nil which prevents any automatic retry; update the
timeout branch in the reconciler to either return a non-zero RequeueAfter (e.g.,
RequeueAfter: time.Minute*5) so the controller will re-check later, or return a
wrapped error instead of nil to trigger error-based requeueing; adjust the block
around waitForNetObservOperator, the r.status.SetDegraded call, and the return
statement so the controller will retry (refer to waitForNetObservOperator,
r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady",
...), and the existing return ctrl.Result{RequeueAfter: 0}, nil).

Comment on lines +169 to +173
// Mark as deployed in Network CR status
if err := r.markNetworkObservabilityDeployed(ctx, &network); err != nil {
klog.Warningf("Failed to update Network Observability deployment status: %v", err)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Failure to mark deployment status could cause repeated FlowCollector creation attempts.

The error from markNetworkObservabilityDeployed is only logged, not returned. If this fails, subsequent reconciliations will not see the NetworkObservabilityDeployed condition and may attempt to re-create the FlowCollector, potentially causing conflicts.

Consider either returning this error or implementing a retry mechanism.

🔧 Option: Return error to ensure retry
 	// Mark as deployed in Network CR status
 	if err := r.markNetworkObservabilityDeployed(ctx, &network); err != nil {
-		klog.Warningf("Failed to update Network Observability deployment status: %v", err)
+		klog.Warningf("Failed to update Network Observability deployment status: %v", err)
+		// Continue anyway - the deployment itself succeeded, and we'll retry on next reconciliation
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Mark as deployed in Network CR status
if err := r.markNetworkObservabilityDeployed(ctx, &network); err != nil {
klog.Warningf("Failed to update Network Observability deployment status: %v", err)
}
// Mark as deployed in Network CR status
if err := r.markNetworkObservabilityDeployed(ctx, &network); err != nil {
klog.Warningf("Failed to update Network Observability deployment status: %v", err)
// Continue anyway - the deployment itself succeeded, and we'll retry on next reconciliation
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 169 -
173, The call to markNetworkObservabilityDeployed(err) is only logged on failure
which prevents the controller from retrying and can lead to repeated
FlowCollector creation; modify the reconciler to return the error instead of
just logging it so the reconciliation is requeued—locate the call to
r.markNetworkObservabilityDeployed(ctx, &network) in the reconciler function
(the surrounding code that currently does klog.Warningf on error) and replace
the log-without-return with returning a wrapped/annotated error (or simply
return err) so the controller will retry the reconcile loop; keep any logging
but ensure the function exits with an error when
markNetworkObservabilityDeployed fails.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
pkg/controller/observability/observability_controller_test.go (1)

1268-1283: ⚠️ Potential issue | 🟡 Minor

Race condition risk: g.Expect called inside goroutines.

Gomega's fail handler calls t.FailNow() which must only be called from the test goroutine. If an assertion fails inside a spawned goroutine, this can cause a test panic.

Suggested fix using error channel
 	// Run 5 concurrent reconciliations
 	done := make(chan bool, 5)
+	errChan := make(chan error, 5)
 	for i := 0; i < 5; i++ {
 		go func() {
 			_, err := r.Reconcile(context.TODO(), req)
-			// All should complete without error (idempotent)
-			g.Expect(err).NotTo(HaveOccurred())
+			errChan <- err
 			done <- true
 		}()
 	}

 	// Wait for all to complete
 	for i := 0; i < 5; i++ {
 		<-done
+		err := <-errChan
+		g.Expect(err).NotTo(HaveOccurred())
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1268 - 1283, The test currently calls g.Expect inside spawned goroutines (see
r.Reconcile, req, and done), which risks calling Gomega's FailNow from non-test
goroutines; instead capture errors in a channel from each goroutine and perform
the assertion in the main test goroutine. Spawn the 5 goroutines to call
r.Reconcile and send any returned error (or nil) into an errs channel, then
after reading all results from errs in the main goroutine use
g.Expect(err).NotTo(HaveOccurred()) for each entry, removing g.Expect from
inside the goroutines.
pkg/controller/observability/observability_controller.go (1)

144-152: ⚠️ Potential issue | 🟠 Major

Timeout stops reconciliation permanently without requeue.

When waitForNetObservOperator times out, returning RequeueAfter: 0 with no error means the controller stops trying. If the operator eventually succeeds, this controller won't notice and won't create the FlowCollector.

Consider returning a non-zero RequeueAfter to periodically check if the operator eventually becomes ready:

Suggested fix
 	if err := r.waitForNetObservOperator(ctx); err != nil {
 		if err == context.DeadlineExceeded {
 			klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout)
 			r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout))
-			return ctrl.Result{RequeueAfter: 0}, nil // Don't requeue
+			return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil // Retry later
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 144 -
152, When waitForNetObservOperator(ctx) returns context.DeadlineExceeded the
current code returns ctrl.Result{RequeueAfter: 0} which stops reconciliation
permanently; change that return to schedule a future requeue (e.g.,
ctrl.Result{RequeueAfter: someNonZeroDuration}) so the controller will
periodically re-check for the operator and eventually create the FlowCollector.
Update the branch handling DeadlineExceeded in the Reconcile logic (the block
around waitForNetObservOperator and r.status.SetDegraded) to return a non-zero
RequeueAfter (choose a sensible poll interval constant or reuse checkTimeout)
instead of 0 and keep returning nil error.
🧹 Nitpick comments (1)
pkg/controller/observability/observability_controller_test.go (1)

1507-1572: Test creates manifest file in working directory.

This test writes to manifests/08-flowcollector.yaml in the working directory. While the cleanup with defer os.Remove() handles the file, consider:

  1. If the test panics before defer is registered (lines 1534-1536), the file persists
  2. Could conflict with parallel tests using the same path

Using a temp directory and temporarily overriding the FlowCollectorYAML constant (if possible) would be more robust, but this is acceptable for an integration-style test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1507 - 1572, TestReconcile_FirstTimeDeploymentSetsCondition writes to a fixed
manifests path (FlowCollectorYAML) which can leak or conflict; fix by creating a
temporary directory with os.MkdirTemp at test start, write the FlowCollector
manifest into that temp dir (use os.Create or os.WriteFile), set or inject the
FlowCollectorYAML path used by the reconciler to point to the temp file (or
refactor the code to accept a manifestPath parameter) before calling Reconcile,
and register defer os.RemoveAll(tempDir) immediately after creating the temp
directory to guarantee cleanup even on panic; update references to
FlowCollectorYAML in the test to use the temp file path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1268-1283: The test currently calls g.Expect inside spawned
goroutines (see r.Reconcile, req, and done), which risks calling Gomega's
FailNow from non-test goroutines; instead capture errors in a channel from each
goroutine and perform the assertion in the main test goroutine. Spawn the 5
goroutines to call r.Reconcile and send any returned error (or nil) into an errs
channel, then after reading all results from errs in the main goroutine use
g.Expect(err).NotTo(HaveOccurred()) for each entry, removing g.Expect from
inside the goroutines.

In `@pkg/controller/observability/observability_controller.go`:
- Around line 144-152: When waitForNetObservOperator(ctx) returns
context.DeadlineExceeded the current code returns ctrl.Result{RequeueAfter: 0}
which stops reconciliation permanently; change that return to schedule a future
requeue (e.g., ctrl.Result{RequeueAfter: someNonZeroDuration}) so the controller
will periodically re-check for the operator and eventually create the
FlowCollector. Update the branch handling DeadlineExceeded in the Reconcile
logic (the block around waitForNetObservOperator and r.status.SetDegraded) to
return a non-zero RequeueAfter (choose a sensible poll interval constant or
reuse checkTimeout) instead of 0 and keep returning nil error.

---

Nitpick comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1507-1572: TestReconcile_FirstTimeDeploymentSetsCondition writes
to a fixed manifests path (FlowCollectorYAML) which can leak or conflict; fix by
creating a temporary directory with os.MkdirTemp at test start, write the
FlowCollector manifest into that temp dir (use os.Create or os.WriteFile), set
or inject the FlowCollectorYAML path used by the reconciler to point to the temp
file (or refactor the code to accept a manifestPath parameter) before calling
Reconcile, and register defer os.RemoveAll(tempDir) immediately after creating
the temp directory to guarantee cleanup even on panic; update references to
FlowCollectorYAML in the test to use the temp file path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b46e56ed-85bd-48c6-8a11-d0576b268eb9

📥 Commits

Reviewing files that changed from the base of the PR and between b1eeed3 and 9cc8e57.

⛔ Files ignored due to path filters (8)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/config/v1/types_network.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/features.md is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/features/features.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (4)
  • go.mod
  • pkg/controller/observability/observability_controller.go
  • pkg/controller/observability/observability_controller_test.go
  • sample-config.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • sample-config.yaml

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 17, 2026

@OlivierCazade: This pull request references CORENET-6714 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add Network Observability Day-0 Installation Support

The replace instruction in the go.mod file need to be removed before merging this branch, this can be removed once the PR on the API repo is merged

This PR introduces automatic Day-0 installation of the Network Observability operator, controlled by a new installNetworkObservability field in the Network CR.

Summary

The Cluster Network Operator (CNO) now handles initial deployment of the Network Observability operator and FlowCollector, enabling network flow monitoring out-of-the-box for OpenShift clusters while respecting user preferences and cluster topology.

Key Features

  1. Opt-out Model with SNO Detection
  • Network Observability is installed by default on multi-node clusters
  • Single Node OpenShift (SNO) clusters are excluded by default to conserve resources
  • Users can explicitly enable/disable via spec.installNetworkObservability
  1. Deployment Tracking
  • After successful deployment, the NetworkObservabilityDeployed condition is set
  • All subsequent reconciliations immediately return when this condition is present
  • This prevents reinstallation if users delete the operator or FlowCollector
  • Users maintain full control after initial deployment
  1. Status Reporting
  • New ObservabilityConfig status level for degraded state reporting
  • Comprehensive error handling with clear degraded status messages
  • Status cleared when Network Observability is running or disabled

Configuration

The spec.installNetworkObservability field accepts three values:

  • "" (empty/nil): Default behavior - install on multi-node clusters, skip on SNO
  • "Enable": Explicitly enable installation, even on SNO clusters
  • "Disable": Explicitly disable installation

RBAC Changes

Added permissions for the CNO to manage:

  • Subscriptions and ClusterServiceVersions (operators.coreos.com)
  • FlowCollectors (flows.netobserv.io)
  • Namespaces for operator deployment

Related

/cc @stleerh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/controller/observability/observability_controller.go (2)

157-162: ⚠️ Potential issue | 🟠 Major

Timeout path disables automatic recovery

Line 161 returns ctrl.Result{RequeueAfter: 0}, nil on timeout, so reconciliation stops and may never create the FlowCollector even if the operator becomes ready later.

Suggested fix
 	if err := r.waitForNetObservOperator(ctx); err != nil {
 		if err == context.DeadlineExceeded {
 			klog.Errorf("Timed out waiting for Network Observability Operator to be ready after %v. Stopping reconciliation.", checkTimeout)
 			r.status.SetDegraded(statusmanager.ObservabilityConfig, "OperatorNotReady", fmt.Sprintf("Timed out waiting for Network Observability Operator to be ready after %v", checkTimeout))
-			return ctrl.Result{RequeueAfter: 0}, nil // Don't requeue
+			return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
 		}

183-185: ⚠️ Potential issue | 🟠 Major

Deployment-condition update errors are swallowed

At Line 183-Line 185, markNetworkObservabilityDeployed failure is only logged. That can cause repeated deploy-path execution on later reconciles instead of retrying the status write immediately.

Suggested fix
 	// Mark as deployed in Network CR status
 	if err := r.markNetworkObservabilityDeployed(ctx, &network); err != nil {
-		klog.Warningf("Failed to update Network Observability deployment status: %v", err)
+		r.status.SetDegraded(statusmanager.ObservabilityConfig, "MarkDeployedError", fmt.Sprintf("Failed to update Network Observability deployment status: %v", err))
+		return ctrl.Result{}, err
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 183 -
185, The call to markNetworkObservabilityDeployed is currently logging failures
and swallowing the error, which prevents the reconcile loop from retrying status
writes; change the code in observability_controller.go so that if
r.markNetworkObservabilityDeployed(ctx, &network) returns an error you propagate
that error (e.g., return fmt.Errorf("markNetworkObservabilityDeployed: %w", err)
or wrap with kerrors) instead of only calling klog.Warningf so the reconciler
will requeue and retry the status update; locate the
markNetworkObservabilityDeployed call in the reconcile flow and replace the
swallowed-log branch with an error return (or requeue result) so failures are
retried immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 326-338: The readiness check in waitForNetObservOperator currently
returns after inspecting the first matching csv in csvs.Items which can be
non-deterministic; instead collect all CSVs whose name has the
"netobserv-operator" prefix, determine the intended CSV by selecting the one
with the highest semantic version parsed from the name (e.g., parse the version
suffix from csv.GetName() like "netobserv-operator.vX.Y.Z" and compare using
semver semantics), and then call unstructured.NestedString on that selected
CSV's Object to check status.phase; update the logic around csvs.Items,
csv.GetName(), and unstructured.NestedString to implement this deterministic
selection before returning phase == "Succeeded".

---

Duplicate comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 183-185: The call to markNetworkObservabilityDeployed is currently
logging failures and swallowing the error, which prevents the reconcile loop
from retrying status writes; change the code in observability_controller.go so
that if r.markNetworkObservabilityDeployed(ctx, &network) returns an error you
propagate that error (e.g., return fmt.Errorf("markNetworkObservabilityDeployed:
%w", err) or wrap with kerrors) instead of only calling klog.Warningf so the
reconciler will requeue and retry the status update; locate the
markNetworkObservabilityDeployed call in the reconcile flow and replace the
swallowed-log branch with an error return (or requeue result) so failures are
retried immediately.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 63c37a51-4f01-487a-9005-0b35a6728d9f

📥 Commits

Reviewing files that changed from the base of the PR and between 9cc8e57 and 5bb3220.

📒 Files selected for processing (1)
  • pkg/controller/observability/observability_controller.go

Comment on lines +326 to +338
for _, csv := range csvs.Items {
name := csv.GetName()
// CSV names are typically like "netobserv-operator.v1.2.3"
if strings.HasPrefix(name, "netobserv-operator") {
phase, found, err := unstructured.NestedString(csv.Object, "status", "phase")
if err != nil {
return false, err
}
if !found {
return false, nil
}
return phase == "Succeeded", nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CSV readiness check can pick the wrong CSV

waitForNetObservOperator returns after the first netobserv-operator* CSV (Line 326-Line 338). During upgrades there can be multiple CSVs; first item ordering is not stable, so readiness can be misdetected and timeout incorrectly.

Suggested fix
-		// Find the netobserv operator CSV
+		// Find any netobserv operator CSV in Succeeded phase
 		for _, csv := range csvs.Items {
 			name := csv.GetName()
 			// CSV names are typically like "netobserv-operator.v1.2.3"
 			if strings.HasPrefix(name, "netobserv-operator") {
 				phase, found, err := unstructured.NestedString(csv.Object, "status", "phase")
 				if err != nil {
 					return false, err
 				}
-				if !found {
-					return false, nil
-				}
-				return phase == "Succeeded", nil
+				if found && phase == "Succeeded" {
+					return true, nil
+				}
 			}
 		}
 		return false, nil
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, csv := range csvs.Items {
name := csv.GetName()
// CSV names are typically like "netobserv-operator.v1.2.3"
if strings.HasPrefix(name, "netobserv-operator") {
phase, found, err := unstructured.NestedString(csv.Object, "status", "phase")
if err != nil {
return false, err
}
if !found {
return false, nil
}
return phase == "Succeeded", nil
}
for _, csv := range csvs.Items {
name := csv.GetName()
// CSV names are typically like "netobserv-operator.v1.2.3"
if strings.HasPrefix(name, "netobserv-operator") {
phase, found, err := unstructured.NestedString(csv.Object, "status", "phase")
if err != nil {
return false, err
}
if found && phase == "Succeeded" {
return true, nil
}
}
}
return false, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 326 -
338, The readiness check in waitForNetObservOperator currently returns after
inspecting the first matching csv in csvs.Items which can be non-deterministic;
instead collect all CSVs whose name has the "netobserv-operator" prefix,
determine the intended CSV by selecting the one with the highest semantic
version parsed from the name (e.g., parse the version suffix from csv.GetName()
like "netobserv-operator.vX.Y.Z" and compare using semver semantics), and then
call unstructured.NestedString on that selected CSV's Object to check
status.phase; update the logic around csvs.Items, csv.GetName(), and
unstructured.NestedString to implement this deterministic selection before
returning phase == "Succeeded".

…ered

Check if the feature gate exists before calling Enabled() to prevent panics on clusters
running older API versions that don't have the feature gate registered yet.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controller/observability/observability_controller.go (1)

133-146: Namespace creation handles not-found but not already-exists race.

The Get-then-Create pattern at lines 136-146 could encounter an AlreadyExists error if another process creates the namespace between the Get and Create calls. This would fail the reconciliation, though it will succeed on retry.

♻️ Optional: handle AlreadyExists gracefully
 		if errors.IsNotFound(err) {
 			if err := r.client.Create(ctx, ns); err != nil {
+				if errors.IsAlreadyExists(err) {
+					klog.V(4).Infof("Namespace %s was created by another process", OperatorNamespace)
+				} else {
 				r.status.SetDegraded(statusmanager.ObservabilityConfig, "CreateNamespaceError", fmt.Sprintf("Failed to create namespace %s: %v", OperatorNamespace, err))
 				return ctrl.Result{}, err
+				}
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller.go` around lines 133 -
146, The Get-then-Create namespace flow in Reconcile
(observability_controller.go) can race: after r.client.Get succeeds with
NotFound, r.client.Create may return AlreadyExists if another actor created the
namespace; update the Create error handling in the block that constructs ns
(using OperatorNamespace) to treat errors.IsAlreadyExists(err) as non-fatal (do
not call r.status.SetDegraded or return error) and only set degraded/return for
other errors; keep using r.status.SetDegraded(statusmanager.ObservabilityConfig,
...) for real create failures so reconciliation continues cleanly on races.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controller/observability/observability_controller.go`:
- Around line 133-146: The Get-then-Create namespace flow in Reconcile
(observability_controller.go) can race: after r.client.Get succeeds with
NotFound, r.client.Create may return AlreadyExists if another actor created the
namespace; update the Create error handling in the block that constructs ns
(using OperatorNamespace) to treat errors.IsAlreadyExists(err) as non-fatal (do
not call r.status.SetDegraded or return error) and only set degraded/return for
other errors; keep using r.status.SetDegraded(statusmanager.ObservabilityConfig,
...) for real create failures so reconciliation continues cleanly on races.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c15118e0-7f89-43e5-a75e-3ba878565507

📥 Commits

Reviewing files that changed from the base of the PR and between 5bb3220 and 2a76763.

📒 Files selected for processing (1)
  • pkg/controller/observability/observability_controller.go

Align struct field colons to match project code style standards.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/controller/observability/observability_controller_test.go (1)

1268-1283: ⚠️ Potential issue | 🟡 Minor

Race condition in concurrent test remains unaddressed.

Calling g.Expect() inside goroutines can cause test panics because Gomega's fail handler invokes t.FailNow(), which must only be called from the test's main goroutine.

🔧 Suggested fix using error channel
 	// Run 5 concurrent reconciliations
 	done := make(chan bool, 5)
+	errChan := make(chan error, 5)
 	for i := 0; i < 5; i++ {
 		go func() {
 			_, err := r.Reconcile(context.TODO(), req)
-			// All should complete without error (idempotent)
-			g.Expect(err).NotTo(HaveOccurred())
+			errChan <- err
 			done <- true
 		}()
 	}

 	// Wait for all to complete
 	for i := 0; i < 5; i++ {
 		<-done
+		err := <-errChan
+		g.Expect(err).NotTo(HaveOccurred())
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1268 - 1283, The test invokes g.Expect inside goroutines which can call
t.FailNow from non-test goroutines; instead collect errors from r.Reconcile in a
buffered channel and perform Gomega assertions in the main test goroutine.
Replace the done channel with an errChan (buffered to 5), have each goroutine
call r.Reconcile(context.TODO(), req) and send the returned error into errChan,
then loop 5 times in the main goroutine to receive err := <-errChan and call
g.Expect(err).NotTo(HaveOccurred()). This keeps the Reconcile calls concurrent
but ensures assertions (using g.Expect) run only on the main test goroutine.
🧹 Nitpick comments (2)
pkg/controller/observability/observability_controller_test.go (2)

936-938: Misleading test name and comment.

The test is named TestReconcile_FlowCollectorDeleted with comment "tests that reconciliation recreates FlowCollector if it gets deleted," but the actual assertion verifies that reconciliation skips reinstallation when the deployed condition is set. The test logic is correct per the PR's design (avoid reinstallation once deployed), but the name/comment should reflect this behavior.

♻️ Suggested rename for clarity
-// TestReconcile_FlowCollectorDeleted tests that reconciliation recreates
-// FlowCollector if it gets deleted
-func TestReconcile_FlowCollectorDeleted(t *testing.T) {
+// TestReconcile_SkipsReinstallWhenFlowCollectorDeleted tests that reconciliation
+// does NOT recreate FlowCollector after deletion if the deployed condition is set
+func TestReconcile_SkipsReinstallWhenFlowCollectorDeleted(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
936 - 938, Rename the test function TestReconcile_FlowCollectorDeleted and
update its comment to reflect the actual behavior being asserted: that
reconciliation skips reinstallation when the FlowCollector has the "deployed"
condition set. Specifically, change the function name (e.g., to
TestReconcile_SkipsReinstallWhenFlowCollectorDeployed) and update the preceding
comment to state "tests that reconciliation skips reinstallation of
FlowCollector when the deployed condition is present"; also update any
references to the old test name in the file (including test registration or
helper calls) so they remain consistent.

1530-1536: Test creates files in working directory without full cleanup.

The test creates manifests/ directory and writes a file but only removes the file on cleanup, not the directory. While manifests/ likely already exists in the repository, consider using t.TempDir() for full isolation, especially if tests run in parallel.

♻️ Suggested fix using temp directory
-	err := os.MkdirAll("manifests", 0755)
-	g.Expect(err).NotTo(HaveOccurred())
-
-	// Create the FlowCollector manifest at the expected path
-	err = os.WriteFile(FlowCollectorYAML, []byte(flowCollectorManifest), 0644)
-	g.Expect(err).NotTo(HaveOccurred())
-	defer os.Remove(FlowCollectorYAML)
+	// Use temp directory and override the manifest path for this test
+	tmpDir := t.TempDir()
+	manifestPath := filepath.Join(tmpDir, "flowcollector.yaml")
+	err := os.WriteFile(manifestPath, []byte(flowCollectorManifest), 0644)
+	g.Expect(err).NotTo(HaveOccurred())
+	
+	// Note: This requires the controller to accept a configurable manifest path
+	// or use a test-specific approach to override FlowCollectorYAML

Alternatively, if modifying the controller isn't feasible, ensure the manifests directory cleanup:

+	defer os.RemoveAll("manifests")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/observability/observability_controller_test.go` around lines
1530 - 1536, The test creates a real manifests/ directory and only removes the
file, leaving the directory behind; update the test to use an isolated temp
directory so artifacts are fully cleaned up: create a temp dir via t.TempDir()
(or os.MkdirTemp if t is not available), write the FlowCollector manifest into
filepath.Join(tempDir, filepath.Base(FlowCollectorYAML)) using the existing
flowCollectorManifest, and set the test to use that path (or defer
os.RemoveAll(tempDir)) instead of writing to the repository-level "manifests"
directory; alternatively ensure the test defers os.RemoveAll("manifests") after
writing—reference FlowCollectorYAML and flowCollectorManifest to locate the
write site in observability_controller_test.go.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 1268-1283: The test invokes g.Expect inside goroutines which can
call t.FailNow from non-test goroutines; instead collect errors from r.Reconcile
in a buffered channel and perform Gomega assertions in the main test goroutine.
Replace the done channel with an errChan (buffered to 5), have each goroutine
call r.Reconcile(context.TODO(), req) and send the returned error into errChan,
then loop 5 times in the main goroutine to receive err := <-errChan and call
g.Expect(err).NotTo(HaveOccurred()). This keeps the Reconcile calls concurrent
but ensures assertions (using g.Expect) run only on the main test goroutine.

---

Nitpick comments:
In `@pkg/controller/observability/observability_controller_test.go`:
- Around line 936-938: Rename the test function
TestReconcile_FlowCollectorDeleted and update its comment to reflect the actual
behavior being asserted: that reconciliation skips reinstallation when the
FlowCollector has the "deployed" condition set. Specifically, change the
function name (e.g., to TestReconcile_SkipsReinstallWhenFlowCollectorDeployed)
and update the preceding comment to state "tests that reconciliation skips
reinstallation of FlowCollector when the deployed condition is present"; also
update any references to the old test name in the file (including test
registration or helper calls) so they remain consistent.
- Around line 1530-1536: The test creates a real manifests/ directory and only
removes the file, leaving the directory behind; update the test to use an
isolated temp directory so artifacts are fully cleaned up: create a temp dir via
t.TempDir() (or os.MkdirTemp if t is not available), write the FlowCollector
manifest into filepath.Join(tempDir, filepath.Base(FlowCollectorYAML)) using the
existing flowCollectorManifest, and set the test to use that path (or defer
os.RemoveAll(tempDir)) instead of writing to the repository-level "manifests"
directory; alternatively ensure the test defers os.RemoveAll("manifests") after
writing—reference FlowCollectorYAML and flowCollectorManifest to locate the
write site in observability_controller_test.go.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 394e355a-3990-4ada-aaac-e4a5ee489446

📥 Commits

Reviewing files that changed from the base of the PR and between 2a76763 and 9548585.

📒 Files selected for processing (1)
  • pkg/controller/observability/observability_controller_test.go

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 19, 2026

@OlivierCazade: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-hypershift-conformance 9548585 link true /test e2e-aws-ovn-hypershift-conformance
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp 9548585 link true /test e2e-metal-ipi-ovn-dualstack-bgp
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw 9548585 link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
ci/prow/security 9548585 link false /test security
ci/prow/e2e-aws-ovn-rhcos10-techpreview 9548585 link false /test e2e-aws-ovn-rhcos10-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants