[OCPERT-264][hold]Test OTE migration in single mode with Dockerfile integration#742
[OCPERT-264][hold]Test OTE migration in single mode with Dockerfile integration#742ming1013 wants to merge 2 commits intoopenshift:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Test result: $ ./tests-extension/bin/router-tests-ext list | grep sig- $ ./tests-extension/bin/router-tests-ext list | grep sig - |wc -l $ ./tests-extension/bin/router-tests-ext run-test "[OTP][sig-network-edge] Network_Edge Component_Router [Level0] Author:shudili-Critical-41110-The threadCount ingresscontroller parameter controls the nbthread option for the haproxy router"
|
|
/hold |
…meter only when non-empty and removed the -nometadata flag from bindata generation to preserve file permissions, resolving the fail to process error in both router and router_clean directories.
WalkthroughThis pull request introduces a comprehensive test extension framework for the OpenShift router. It adds a new Docker build stage, Go module structure, test entrypoint, and extensive end-to-end test suites covering AWS load balancers, DNS operations, external DNS, routing behavior, TLS, and platform features. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
|
$ ./bin/router-tests-ext run-test "[OTP][sig-network-edge] Network_Edge Component_Router [Level0] Author:shudili-High-55367-Default HAProxy maxconn value to 50000 for OCP 4.12 and later" Random Seed: 1773805897 - will randomize all specs Will run 1 of 1 specs [OTP][sig-network-edge] Network_Edge Component_Router [Level0] Author:shudili-High-55367-Default HAProxy maxconn value to 50000 for OCP 4.12 and later I0318 11:52:28.701871 50991 util.go:612] Found the given string maxconn 50000 in haproxy.config STEP: Patch tuningOptions/maxConnections with null to the ingress-controller 03/18/26 11:52:28.701 I0318 11:52:51.435134 50991 util.go:612] Found the given string maxconn 50000 in haproxy.config STEP: Patch tuningOptions/maxConnections 50000 to the ingress-controller 03/18/26 11:52:51.435 I0318 11:53:20.358721 50991 util.go:612] Found the given string maxconn 50000 in haproxy.config I0318 11:53:20.358815 50991 client.go:1013] Running 'oc --kubeconfig=/home/minl/openshift/auth/kubeconfig/kubeconfig delete --ignore-not-found -n openshift-ingress-operator ingresscontroller ocp55367' Ran 1 of 1 Specs in 155.509 seconds |
There was a problem hiding this comment.
Actionable comments posted: 2
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (31)
tests-extension/test/e2e/testdata/router/extdns/subscription.yaml-10-11 (1)
10-11:⚠️ Potential issue | 🟠 MajorHardcoded QE catalog source will cause test failures in non-QE environments
The
qe-app-registrysource is hardcoded in the Subscription resource. This breaks test execution outside QE environments since the catalog source won't exist, failing Subscription resolution and blocking test setup. The issue extends beyond this file—the same hardcoding exists inawslb/subscription-src-qe.yamlandawslb/subscription-src-qe-sts.yaml.The
extdns/subscription.yamlfilename (without "-qe" suffix) suggests it's intended as shared testdata, unlike the explicitly-namedsubscription-src-qe.yamlfiles. Either make the source configurable or rename and document this fixture as QE-only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/testdata/router/extdns/subscription.yaml` around lines 10 - 11, The Subscription resource hardcodes the catalog source "qe-app-registry", which breaks non-QE runs; update extdns/subscription.yaml to remove the hardcoded value by making the source configurable (e.g., replace "qe-app-registry" with a placeholder like ${CATALOG_SOURCE} or a template variable such as {{ .Values.catalogSource }}) and wire the test harness to set that env/template value at test runtime, and for the QE-only fixtures (awslb/subscription-src-qe.yaml and awslb/subscription-src-qe-sts.yaml) either rename them to include a "-qe" suffix and document they are QE-only or keep them but also convert them to use the same configurable placeholder so all subscription fixtures behave consistently.tests-extension/test/e2e/tuning.go-347-356 (1)
347-356:⚠️ Potential issue | 🟠 MajorTest logic for over-max healthCheckInterval appears flawed.
The test patches with an over-max value
2147483900ms, then expects generation to remain at"2"(line 351), but subsequently callsgetOneNewRouterPodFromRollingUpdate(line 354) which implies a rolling update occurred. This is contradictory:
- If the patch is rejected, generation stays at 2, no rolling update occurs, and the previous value should persist.
- If the patch is accepted (with clamping), generation should increment to
"3", triggering a rolling update.The test cannot have it both ways.
Suggested fix: expect generation 3 if clamping is applied
exutil.By("7: Try to patch tuningOptions/healthCheckInterval 2147483900ms which is larger than the max healthCheckInterval, to the ingress-controller") NegHealthCheckInterval := "2147483900ms" patchResourceAsAdmin(oc, ingctrl1.namespace, ingctrlResource1, "{\"spec\": {\"tuningOptions\": {\"healthCheckInterval\": \""+NegHealthCheckInterval+"\"}}}") - ensureRouterDeployGenerationIs(oc, ingctrl1.name, "2") + ensureRouterDeployGenerationIs(oc, ingctrl1.name, "3") exutil.By("8: Check ROUTER_BACKEND_CHECK_INTERVAL env in a route pod which should be the max: 2147483647ms")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/tuning.go` around lines 347 - 356, The test currently patches tuningOptions.healthCheckInterval with NegHealthCheckInterval ("2147483900ms") but inconsistently asserts ensureRouterDeployGenerationIs(..., "2") while later calling getOneNewRouterPodFromRollingUpdate and checking the env; update the test to reflect the clamping behavior by changing the generation expectation to "3" (or otherwise assert that a rolling update occurred) so the flow is consistent: after patchResourceAsAdmin(oc, ingctrl1.namespace, ingctrlResource1, ... NegHealthCheckInterval) call ensureRouterDeployGenerationIs(oc, ingctrl1.name, "3") before calling getOneNewRouterPodFromRollingUpdate and readRouterPodEnv to verify ROUTER_BACKEND_CHECK_INTERVAL was clamped to "2147483647ms".tests-extension/test/e2e/tuning.go-610-621 (1)
610-621:⚠️ Potential issue | 🟠 MajorComment/code mismatch and inconsistent maxconn assertion.
- Line 610 comment says "50000" but line 611 sets
maxConnections = "500000".- More importantly: Lines 615-618 correctly assert
ROUTER_MAX_CONNECTIONS=500000, but line 621 expectsmaxconn 50000in haproxy.config. If the env var is500000, the haproxy config should reflect the same value.Suggested fix
- exutil.By("Patch tuningOptions/maxConnections 50000 to the ingress-controller") + exutil.By("Patch tuningOptions/maxConnections 500000 to the ingress-controller") maxConnections = "500000" patchResourceAsAdmin(oc, ingctrl.namespace, ingctrlResource, "{\"spec\": {\"tuningOptions\": {\"maxConnections\": "+maxConnections+"}}}") ensureRouterDeployGenerationIs(oc, ingctrl.name, "2") exutil.By("Check ROUTER_MAX_CONNECTIONS env in a route pod which should be " + maxConnections) podname = getOneNewRouterPodFromRollingUpdate(oc, ingctrl.name) maxConnSearch := readRouterPodEnv(oc, podname, "ROUTER_MAX_CONNECTIONS") o.Expect(maxConnSearch).To(o.ContainSubstring("ROUTER_MAX_CONNECTIONS=" + maxConnections)) - exutil.By("Check maxconn in haproxy.config which should be 50000") - ensureHaproxyBlockConfigContains(oc, podname, "maxconn", []string{"maxconn 50000"}) + exutil.By("Check maxconn in haproxy.config which should be 500000") + ensureHaproxyBlockConfigContains(oc, podname, "maxconn", []string{"maxconn 500000"})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/tuning.go` around lines 610 - 621, The test is inconsistent: the comment, the maxConnections variable, and the haproxy assertion disagree; make them consistent by using the maxConnections variable everywhere and updating the human-readable comment to match its value — e.g., set the comment in the exutil.By call to include maxConnections (or its current value) and replace the hard-coded haproxy expectation in ensureHaproxyBlockConfigContains("maxconn", []string{"maxconn 50000"}) with a dynamic expectation built from maxConnections (e.g., []string{"maxconn " + maxConnections}); keep other calls (patchResourceAsAdmin, readRouterPodEnv, getOneNewRouterPodFromRollingUpdate) as-is.tests-extension/test/e2e/tls.go-451-452 (1)
451-452:⚠️ Potential issue | 🟠 MajorGuard the certificate date parsing before indexing it.
If curl's verbose output stops emitting a
start dateline, this panics instead of failing with context.🧷 Example fix
dateRe := regexp.MustCompile("(start date.*)") - certStartDate := dateRe.FindAllString(string(statsOut), -1) + certStartDate := dateRe.FindString(statsOut) + o.Expect(certStartDate).NotTo(o.BeEmpty(), "expected curl verbose output to include a certificate start date")- certStartDate1 := dateRe.FindAllString(string(statsOut1), -1) - // Cross check the start date of the ceritificate is not same after reloading - o.Expect(certStartDate1[0]).NotTo(o.Equal(certStartDate[0])) + certStartDate1 := dateRe.FindString(statsOut1) + o.Expect(certStartDate1).NotTo(o.BeEmpty(), "expected curl verbose output to include a certificate start date after reload") + // Cross check the start date of the certificate is not same after reloading + o.Expect(certStartDate1).NotTo(o.Equal(certStartDate))Also applies to: 470-472
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/tls.go` around lines 451 - 452, The code currently assumes dateRe.FindAllString(string(statsOut), -1) returns at least one element and directly indexes certStartDate[0], which can panic if no "start date" line exists; update the parsing in the tls.go test to check the result length before indexing (e.g., verify len(certStartDate) > 0) and fail the test with a clear error message or skip the assertion if missing; apply the same guard to the analogous parsing at lines referenced (the second occurrence around 470-472) so both uses of dateRe, certStartDate, and statsOut are defensive and provide contextual failure output instead of panicking.tests-extension/test/e2e/tls.go-515-526 (1)
515-526:⚠️ Potential issue | 🟠 MajorWait for the generated route and make the lookup deterministic.
Ingress-to-route reconciliation is async. Both tests list routes once and immediately use
routeNames[0]; that can panic on an empty list, and the TLS-cert case also reads annotations/host before the route has necessarily settled.⏳ Example fix
- routeNames := getResourceName(oc, ns, "route") + var routeNames []string + o.Eventually(func() int { + routeNames = getResourceName(oc, ns, "route") + return len(routeNames) + }).Should(o.Equal(1))- output := getByJsonPath(oc, ns, "route/"+routeNames[0], "{.metadata.annotations}") - o.Expect(output).Should(o.ContainSubstring(`"route.openshift.io/destination-ca-certificate-secret":"service-secret"`)) - output = getByJsonPath(oc, ns, "route/"+routeNames[0], "{.spec.host}") - o.Expect(output).Should(o.ContainSubstring(`service-secure-%s.ocp51980.%s`, ns, baseDomain)) + waitForOutputContains(oc, ns, "route/"+routeNames[0], "{.metadata.annotations}", `"route.openshift.io/destination-ca-certificate-secret":"service-secret"`) + waitForOutputContains(oc, ns, "route/"+routeNames[0], "{.spec.host}", fmt.Sprintf(`service-secure-%s.ocp51980.%s`, ns, baseDomain))Also applies to: 576-588
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/tls.go` around lines 515 - 526, The test is reading routeNames[0] immediately after calling getRoutes which can be empty due to async ingress->route reconciliation; make the lookup deterministic by polling until a route is present and stable before using routeNames[0]. Specifically, after calling getRoutes(ns) use a short retry loop that calls getResourceName(oc, ns, "route") until the returned list length > 0 (and optionally stable across two successive polls), then assign routeNames := that result and proceed to call waitForOutputContains and waitForOutputEquals for that specific route; ensure the same pattern is applied to the TLS-cert case (the block using getRoutes/getResourceName and later waitForOutput* around routeNames[0]).tests-extension/test/e2e/tls.go-225-227 (1)
225-227:⚠️ Potential issue | 🟠 MajorWait for route validation before asserting the failure count.
ExtendedValidationFailedis set asynchronously. Reading all routes immediately after the fourcreateRoutecalls makes this spec race controller reconciliation and fail intermittently.⏳ Example fix
- routeOutput := getRoutes(oc, ns) - o.Expect(strings.Count(routeOutput, `ExtendedValidationFailed`) == 4).To(o.BeTrue()) + o.Eventually(func() int { + return strings.Count(getRoutes(oc, ns), "ExtendedValidationFailed") + }).Should(o.Equal(4))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/tls.go` around lines 225 - 227, The test reads routeOutput immediately after the four createRoute calls and races controller reconciliation; replace the direct assertion using getRoutes(oc, ns) with a polling/assert-with-timeout that repeatedly calls getRoutes and checks strings.Count(routeOutput, `ExtendedValidationFailed`) == 4 until it becomes true or a sensible timeout elapses (use the test suite's wait helper like wait.PollImmediate or Gomega's Eventually), referencing the existing getRoutes function, the `ExtendedValidationFailed` marker, and the createRoute sequence so the spec waits for the controller to finish before asserting.tests-extension/test/e2e/tls.go-437-440 (1)
437-440:⚠️ Potential issue | 🟠 MajorDon't convert arbitrary
oc getfailures into a skip.
|| err != nilhides API/RBAC/transient failures as skips. Skip only when auth is actually absent and assert everything else.🛑 Example fix
output, err := oc.WithoutNamespace().AsAdmin().Run("get").Args("route", "oauth-openshift", "-n", "openshift-authentication").Output() - if strings.Contains(output, "namespaces \"openshift-authentication\" not found") || err != nil { - g.Skip("This cluster dont have authentication operator, so skipping the test.") + if err != nil { + if strings.Contains(output, `namespaces "openshift-authentication" not found`) || + strings.Contains(output, `routes.route.openshift.io "oauth-openshift" not found`) { + g.Skip("This cluster does not have the authentication operator, so skipping the test.") + } + o.Expect(err).NotTo(o.HaveOccurred()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/tls.go` around lines 437 - 440, The current check converts any error from the oc get command into a test skip by using "|| err != nil"; change the logic to only skip when the output explicitly indicates the authentication namespace is missing (strings.Contains(output, "namespaces \"openshift-authentication\" not found")), and handle other errors as test failures: call g.Fatalf (or the test framework's fail/assert helper) with the err details when oc.WithoutNamespace().AsAdmin().Run(...).Output() returns a non-nil err so API/RBAC/transient failures are not masked as skips while keeping the original g.Skip when the namespace is truly absent.tests-extension/test/e2e/logging.go-44-49 (1)
44-49:⚠️ Potential issue | 🟠 MajorMake the ingress controller names run-unique.
These specs create fixed names in the shared
openshift-ingress-operatornamespace (ocp34166,ocp34178, etc.). If a run is interrupted or two jobs hit the same cluster, the next create will fail withAlreadyExistsbefore the deferred cleanup can help. Add a short unique suffix or delete any stale resource first.Also applies to: 127-132, 189-196, 261-266, 340-345
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/logging.go` around lines 44 - 49, The test creates fixed ingress controller names (e.g., the ingressControllerDescription instances with name "ocp34166", "ocp34178", etc.) which can cause AlreadyExists failures on concurrent or interrupted runs; update the test to generate run-unique names (for example append a short random/UUID/time-based suffix to ingressControllerDescription.name when constructing ingctrl and the other affected instances at lines referenced) or ensure any existing resource is deleted before create (call the cleanup/delete path for the same name). Locate usages of ingressControllerDescription and the variables like ingctrl and apply the unique-suffix strategy (or pre-delete) consistently for the other occurrences (lines ~127-132, 189-196, 261-266, 340-345).tests-extension/test/e2e/logging.go-76-78 (1)
76-78:⚠️ Potential issue | 🟠 MajorReplace IPv4-only pod address lookup with cluster-aware alternative.
Every spec retrieves pod IP via
getPodv4Address(...), which only works on IPv4 clusters. This makes the suite fail on IPv6-only clusters and leaves dual-stack runs untested for IPv6 paths. The codebase already has infrastructure for detecting cluster IP stack (ipv6single, dualstack) and retrieving the appropriate address family. Adopt a similar pattern to select the correct pod address based on cluster configuration.Affected occurrences in logging.go
Lines 76–78, 159–161, 227–229, 293–295, 373–375
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/logging.go` around lines 76 - 78, Replace the IPv4-only call to getPodv4Address with a cluster-aware lookup: detect the cluster stack using the existing ipv6single and dualstack flags and call getPodv6Address when the cluster is IPv6-only (ipv6single) (or choose IPv6 when dualstack tests require it), otherwise fall back to getPodv4Address; update the uses in the logging spec (the call that sets podIP used to build toDst and curlCmd) to use the selected podIP so routehost + ":80:" + podIP remains correct, and apply the same change for all other occurrences that call getPodv4Address in this file.tests-extension/test/e2e/testdata/router/coreDNS-pod.yaml-67-69 (1)
67-69:⚠️ Potential issue | 🟠 MajorAvoid a public resolver in this fixture.
Hard-coding
8.8.8.8makes the test depend on internet egress and Google DNS availability, which will fail on disconnected or egress-restricted clusters for reasons unrelated to router behavior.One safer default
- forward . 8.8.8.8 { + forward . /etc/resolv.conf {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/testdata/router/coreDNS-pod.yaml` around lines 67 - 69, The CoreDNS fixture currently forwards all queries to the public resolver "8.8.8.8", which makes tests rely on external internet/DNS; replace that hard-coded public resolver in the "forward . 8.8.8.8" stanza with a non-public/local resolver used in CI (for example the cluster DNS service name or IP such as kube-dns.kube-system.svc.cluster.local or 127.0.0.1:53) or remove the forward stanza and use a local stub resolver for tests; update the "forward" line in this block so it no longer references 8.8.8.8 to avoid egress-dependency.tests-extension/test/e2e/testdata/router/extdns/sts-exdns-perms-policy.json-5-22 (1)
5-22:⚠️ Potential issue | 🟠 MajorScope Route53 write access to a specific hosted zone.
The permissions policy allows
route53:ChangeResourceRecordSetson all hosted zones in the AWS account (hostedzone/*). This policy is read from the JSON file and attached as-is without modification, meaning a test failure could mutate unrelated DNS zones. This should be scoped to the specific hosted zone under test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/testdata/router/extdns/sts-exdns-perms-policy.json` around lines 5 - 22, The policy currently grants route53:ChangeResourceRecordSets to "arn:aws:route53:::hostedzone/*" which allows changes to all hosted zones; narrow this by replacing that Resource value with the specific hosted zone ARN "arn:aws:route53:::hostedzone/<HOSTED_ZONE_ID>" (use your test-hosted-zone id variable such as TEST_HOSTED_ZONE_ID or inject from the test fixture) so only the zone under test is writable, and update the test setup/template that reads sts-exdns-perms-policy.json to substitute the actual HOSTED_ZONE_ID at runtime (or generate the policy JSON dynamically) so the policy attached during tests is scoped to that single hosted zone.tests-extension/test/e2e/testdata/router/coreDNS-pod.yaml-8-27 (1)
8-27:⚠️ Potential issue | 🟠 MajorDrop privilege escalation from this pod.
Line 22 makes the container harder to admit under restricted SCC/PSA, and Line 9 explicitly opts out of non-root enforcement. That should not be needed just to bind
:53whenNET_BIND_SERVICEis already granted.Minimal hardening
securityContext: - allowPrivilegeEscalation: true + allowPrivilegeEscalation: false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/testdata/router/coreDNS-pod.yaml` around lines 8 - 27, The pod spec currently disables non-root enforcement and allows privilege escalation which is unnecessary for binding :53; update the coredns container securityContext (container name "coredns") to remove or set runAsNonRoot to true (replace the explicit false), and set allowPrivilegeEscalation to false while keeping only the needed capability NET_BIND_SERVICE under capabilities.add and preserving capabilities.drop: - ALL; make these changes on the securityContext blocks (pod-level securityContext and containers[].securityContext) to comply with restricted SCC/PSA.tests-extension/go.mod-261-261 (1)
261-261:⚠️ Potential issue | 🟠 MajorUpgrade
go.opentelemetry.io/otel/sdkto v1.40.0 or later.
go.opentelemetry.io/otel/sdkv1.37.0 is vulnerable to GHSA-9h8m-3fm2-qjrq, a HIGH-severity arbitrary code execution vulnerability affecting versions 1.21.0–1.39.0 on macOS via PATH hijacking in theioreginvocation. Patched versions (v1.40.0+) are available and should be used immediately.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/go.mod` at line 261, Dependency go.opentelemetry.io/otel/sdk is pinned to v1.37.0 which is vulnerable; update the module requirement to v1.40.0 or later (e.g., v1.40.0) and run the Go module update sequence (go get go.opentelemetry.io/otel/sdk@v1.40.0 and go mod tidy) so the lockfile and transitive deps reflect the patched version; search for the module name "go.opentelemetry.io/otel/sdk" in go.mod to locate the entry to change.tests-extension/go.mod-213-215 (1)
213-215:⚠️ Potential issue | 🟠 MajorUpgrade
github.com/opencontainers/runcto v1.2.8 or later to address container escape vulnerabilities.
runcv1.2.5 is vulnerable to CVE-2025-31133 and CVE-2025-52565 (both HIGH severity, CVSS ≥7.5), which exploit insufficient verification inmaskedPathshandling and/dev/ptsbind-mounting. These enable container escape and host DoS. Update to v1.2.8 (or v1.3.3+/v1.4.0-rc.3+) to remediate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/go.mod` around lines 213 - 215, The go.mod lists the indirect module github.com/opencontainers/runc at v1.2.5 which is vulnerable; update the version string for the module github.com/opencontainers/runc to v1.2.8 (or a later patched release such as v1.3.3 or v1.4.0-rc.3+) in go.mod, then run go mod tidy (and vendor/update dependencies if you vendor) and rebuild/run tests to ensure the new runc version is picked up and no dependency conflicts occur.tests-extension/test/e2e/ipfailover.go-91-93 (1)
91-93:⚠️ Potential issue | 🟠 MajorMark these disruptive ipfailover cases as serial too.
The surrounding comments say both tests can conflict with other ipfailover scenarios, but unlike the neighboring cases they are missing the
[Serial]marker used elsewhere in this file.♻️ Proposed fix
- g.It("Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41027-pod and service automatically switched over to standby when master fails [Disruptive]", func() { + g.It("Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41027-pod and service automatically switched over to standby when master fails [Disruptive] [Serial]", func() { @@ - g.It("Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-High-41030-preemption strategy for keepalived ipfailover [Disruptive]", func() { + g.It("Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-High-41030-preemption strategy for keepalived ipfailover [Disruptive] [Serial]", func() {Also applies to: 249-251
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/ipfailover.go` around lines 91 - 93, Two disruptive ipfailover Ginkgo tests in ipfailover.go are missing the [Serial] marker and should be run serially; update the g.It(...) descriptions to include "[Serial]" so they match surrounding cases. Locate the g.It(...) calls (e.g., the test whose description starts with "Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41027-pod and service automatically switched over to standby when master fails [Disruptive]") and the other similar ipfailover case further down, and prepend or append the literal "[Serial]" inside their string descriptions so the tests are marked serial.tests-extension/test/e2e/route-weight.go-558-559 (1)
558-559:⚠️ Potential issue | 🟠 MajorUse
srvPod2Namein both grep patterns.Line 558 and Line 559 interpolate
srvPod3Nametwice and never matchsrvPod2Name, so this assertion can pass without checking one of the three backends at all.♻️ Proposed fix
- selectedSrvNum := fmt.Sprintf("cat haproxy.config | grep -E \"server pod:ingress-canary|server pod:%s|server pod:%s|server pod:%s\"| wc -l", srvPod1Name, srvPod3Name, srvPod3Name) - selectedWeight1Num := fmt.Sprintf("cat haproxy.config | grep -E \"server pod:ingress-canary|server pod:%s|server pod:%s|server pod:%s\"| grep \"weight 1\" |wc -l", srvPod1Name, srvPod3Name, srvPod3Name) + selectedSrvNum := fmt.Sprintf("cat haproxy.config | grep -E \"server pod:ingress-canary|server pod:%s|server pod:%s|server pod:%s\"| wc -l", srvPod1Name, srvPod2Name, srvPod3Name) + selectedWeight1Num := fmt.Sprintf("cat haproxy.config | grep -E \"server pod:ingress-canary|server pod:%s|server pod:%s|server pod:%s\"| grep \"weight 1\" |wc -l", srvPod1Name, srvPod2Name, srvPod3Name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/route-weight.go` around lines 558 - 559, The grep patterns used to build selectedSrvNum and selectedWeight1Num mistakenly interpolate srvPod3Name twice instead of including srvPod2Name, so the check never verifies the second backend; update both fmt.Sprintf calls (for selectedSrvNum and selectedWeight1Num) to include srvPod2Name in the middle pattern (server pod:%s) so the three backends used are srvPod1Name, srvPod2Name, srvPod3Name and the weight/selection assertions correctly cover all three servers.tests-extension/test/e2e/ipfailover.go-36-45 (1)
36-45:⚠️ Potential issue | 🟠 MajorFail on setup errors instead of masking them.
These guard clauses drop the errors from node discovery. If either call fails, the spec can skip or continue with empty data instead of surfacing the real cluster/setup problem.
♻️ Proposed fix
- workerNodeCount, _ := exactNodeDetails(oc) + workerNodeCount, err := exactNodeDetails(oc) + o.Expect(err).NotTo(o.HaveOccurred()) if workerNodeCount < 2 { g.Skip("Skipping as we need two worker nodes") } g.By("check the cluster has remote worker profile") - remoteWorkerDetails, _ := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "kubernetes.io/hostname").Output() + remoteWorkerDetails, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "kubernetes.io/hostname").Output() + o.Expect(err).NotTo(o.HaveOccurred()) if strings.Contains(remoteWorkerDetails, "remote-worker") { g.Skip("Skip as ipfailover currently doesn't support on remote-worker profile") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/ipfailover.go` around lines 36 - 45, The setup is swallowing errors from exactNodeDetails and the oc get nodes call; change both to capture and check their errors (from exactNodeDetails(...) and oc.AsAdmin().WithoutNamespace().Run("get").Args(...).Output()), and if err != nil fail the spec immediately (e.g. g.Fatalf or t.Fatalf) with the error details instead of continuing or skipping; only keep the existing g.Skip when the call succeeds and the output contains "remote-worker". Ensure you reference exactNodeDetails and the oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "kubernetes.io/hostname").Output() call when making the changes.tests-extension/test/e2e/testdata/fixtures.go-24-30 (1)
24-30:⚠️ Potential issue | 🟠 MajorReturn an isolated fixture copy here.
FixturePathreuses the same extracted path for the whole process. Downstream specs mutate files under that tree—for exampletests-extension/test/e2e/cookies.goon Line 52 and Line 123—so later specs can read already-modified fixtures and become order-dependent.♻️ One way to isolate callers
func FixturePath(elem ...string) string { relativePath := filepath.Join(elem...) - targetPath := filepath.Join(fixtureDir, relativePath) - - if _, err := os.Stat(targetPath); err == nil { - return targetPath - } + copyRoot, err := os.MkdirTemp(fixtureDir, "fixture-copy-") + if err != nil { + panic(fmt.Sprintf("failed to create fixture copy directory: %v", err)) + } + targetPath := filepath.Join(copyRoot, relativePath)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/testdata/fixtures.go` around lines 24 - 30, FixturePath currently returns the original extracted fixture tree (targetPath) which is mutated across tests; change it to return an isolated copy by: after confirming targetPath exists, create a new temporary directory (os.MkdirTemp), recursively copy the contents of targetPath into that temp dir (preserving file contents and permissions), and return the temp dir path instead of targetPath; update references to fixtureDir/targetPath inside FixturePath and ensure errors from MkdirTemp and the recursive copy are handled/returned or cause a test-failing panic as appropriate.tests-extension/test/e2e/externaldns_util.go-107-112 (1)
107-112:⚠️ Potential issue | 🟠 MajorMissing bounds check before accessing
HostedZones[0]can cause panic.If no hosted zone matches the domain name,
hostedZoneDetails.HostedZoneswill be empty, and accessing index[0]will panic with an index out of range error.🐛 Proposed fix to add bounds check
hostedZoneDetails, err := route53Client.ListHostedZonesByNameWithContext( context.Background(), &route53.ListHostedZonesByNameInput{ DNSName: aws.String(domainName)}) o.Expect(err).NotTo(o.HaveOccurred()) + o.Expect(hostedZoneDetails.HostedZones).NotTo(o.BeEmpty(), "No hosted zone found for domain: "+domainName) e2e.Logf("The zone Id of the host domain '%s' is '%s'", *hostedZoneDetails.HostedZones[0].Name, *hostedZoneDetails.HostedZones[0].Id) return strings.TrimPrefix(*hostedZoneDetails.HostedZones[0].Id, "/hostedzone/")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/externaldns_util.go` around lines 107 - 112, Add a bounds check after calling route53Client.ListHostedZonesByNameWithContext to ensure hostedZoneDetails.HostedZones is non-empty before accessing [0]; for example assert with o.Expect(hostedZoneDetails.HostedZones).NotTo(o.BeEmpty()) or o.Expect(len(hostedZoneDetails.HostedZones)).To(BeNumerically(">", 0)), then also defensively check that hostedZoneDetails.HostedZones[0].Id and .Name are non-nil before using them in e2e.Logf and the return statement so the test fails cleanly instead of panicking.tests-extension/test/e2e/awslb-operator.go-117-126 (1)
117-126:⚠️ Potential issue | 🟠 MajorLogic error: String length check instead of subnet count validation.
internalSubnetsis a string from the jsonpath output, not a slice.len(internalSubnets) > 1checks if the string has more than 1 character, not if there are 2+ subnets. A single subnet ID like "subnet-12345678" would exceed this character threshold, defeating the intended subnet count validation. The check should parse the subnet list (likely space-separated or formatted as a JSON array) and count the elements to ensure >= 2 subnets are discovered before attempting ALB provisioning.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/awslb-operator.go` around lines 117 - 126, internalSubnets is a string, so replace the len(internalSubnets) > 1 check with explicit parsing and element count: parse the returned internalSubnets (from the oc.AsAdmin().WithoutNamespace().Run("get").Args(...).Output() call) into a []string (handle both JSON array output via json.Unmarshal into []string and plain space/comma-separated output via strings.Fields or strings.Split) and then check len(parsedSubnets) >= 2 before calling waitForLoadBalancerProvision; if the parsed slice has fewer than 2 elements, run the oc.Run("describe").Args("ingress", "ingress-test").Output() branch as currently implemented.tests-extension/test/e2e/cloud_load_balancer.go-87-89 (1)
87-89:⚠️ Potential issue | 🟠 MajorIn-place sed modification of shared fixture file.
Same pattern as other test files - using
sed -i''oncustomTemppermanently modifies the shared fixture file, affecting test isolation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/cloud_load_balancer.go` around lines 87 - 89, The test currently runs an in-place sed (sedCmd using "sed -i'' -e ..." on customTemp via exec.Command), which mutates the shared fixture; change it to edit a temporary copy instead: create a temp file, run sed (or perform the replace in Go) writing output to that temp file, and use the temp path for the rest of the test (or atomically rename back if necessary); ensure exec.Command references the temp path (not customTemp) or replace the exec.Command call with Go string replacement and os.WriteFile so the original fixture (customTemp) is never modified.tests-extension/test/e2e/ingress.go-46-49 (1)
46-49:⚠️ Potential issue | 🟠 MajorSame in-place sed modification pattern affects multiple tests.
This pattern of using
sed -i''on shared fixture files appears in multiple tests in this file (lines 47, 210, 313, 375, 462). All these instances risk test pollution and should be refactored to use temporary copies or in-memory string replacement.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/ingress.go` around lines 46 - 49, The tests modify shared fixture files in-place using sed -i'' which causes test pollution; update the ingress tests (e.g., where routeRandHost is set and createResourceFromFile is called, and similar occurrences around lines referenced) to avoid in-place edits by making a temporary copy or performing the replacement in-memory before passing content to createResourceFromFile (or an equivalent helper), and ensure cleanup; search for other instances (the other occurrences you noted) and apply the same change so tests operate on isolated copies instead of editing the shared fixture files directly.tests-extension/test/e2e/route-admission.go-47-49 (1)
47-49:⚠️ Potential issue | 🟠 MajorIn-place sed modification of shared fixture file.
Same issue as in ingress.go - using
sed -i''oncustomTemppermanently modifies the fixture file. This affects test isolation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/route-admission.go` around lines 47 - 49, The test currently runs an in-place sed command using sedCmd and exec.Command against customTemp which mutates the shared fixture; instead, create a new temporary file (e.g., via os.CreateTemp / ioutil.TempFile) or copy customTemp to a temp path and perform the substitution on that temp file, or perform the string replacement in Go (read file, strings.Replace, write to temp file), then run the test assertions against the temp file and ensure cleanup; update usages of sedCmd/exec.Command to target the temp file variable and do not modify customTemp directly.tests-extension/test/e2e/externaldns.go-69-71 (1)
69-71:⚠️ Potential issue | 🟠 MajorIn-place sed modification of shared fixture file.
Same pattern as other test files - using
sed -i''onsampleAWSpermanently modifies the shared fixture file, affecting test isolation and parallel execution.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/externaldns.go` around lines 69 - 71, The test is modifying the shared fixture file via sedCmd and exec.Command("bash", "-c", sedCmd) which breaks isolation; instead read the contents of sampleAWS into memory (os.ReadFile), create a per-test temp file (os.CreateTemp / os.MkdirTemp), perform the basedomain replacement in Go (strings.Replace or regexp) and write the modified contents to the temp file, then use that temp file path in the rest of the test (replacing uses of sampleAWS); this preserves the original fixture and avoids in-place edits by sedCmd/exec.Command.tests-extension/test/e2e/ingress.go-65-68 (1)
65-68:⚠️ Potential issue | 🟠 MajorIn-place sed modification of shared fixture file may cause test pollution.
Using
sed -i''to modifytestIngress(which points to a shared fixture file) permanently alters the file for subsequent test runs. If tests run in parallel or the file isn't restored, later tests will use modified values. Consider using a temporary copy of the fixture file instead.Proposed approach
Create a temporary copy before modification:
// Create a temp copy of the fixture tmpFile, err := os.CreateTemp("", "ingress-resource-*.yaml") o.Expect(err).NotTo(o.HaveOccurred()) defer os.Remove(tmpFile.Name()) content, err := os.ReadFile(testIngress) o.Expect(err).NotTo(o.HaveOccurred()) // Perform substitution in memory and write to temp file modifiedContent := strings.ReplaceAll(string(content), "edgehostname", routeEdgeHost) // ... other replacements err = os.WriteFile(tmpFile.Name(), []byte(modifiedContent), 0644) o.Expect(err).NotTo(o.HaveOccurred()) createResourceFromFile(oc, ns, tmpFile.Name())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/ingress.go` around lines 65 - 68, The current in-place sed call that builds sedCmd and mutates testIngress causes test pollution; instead create a temporary copy of the fixture, read testIngress into memory, perform the hostname substitutions for routeEdgeHost, routePassHost, routeReenHost, routeRandHost (use strings.ReplaceAll or similar), write the modified content to a temp file created via os.CreateTemp, defer os.Remove(tmp.Name()), and then call createResourceFromFile(oc, ns, tmp.Name()); remove the exec.Command("bash","-c", sedCmd) call and the direct use of testIngress so the shared fixture remains unchanged.tests-extension/test/e2e/dns.go-1172-1174 (1)
1172-1174:⚠️ Potential issue | 🟠 MajorIn-place sed modification of shared fixture file.
Same pattern as other test files - using
sed -i''onegressFirewallpermanently modifies the shared fixture file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/dns.go` around lines 1172 - 1174, The test currently runs sed -i'' directly on the shared fixture file (egressFirewall) which mutates the repository fixture; instead, create a temporary copy of egressFirewall (use os.CreateTemp/os.MkdirTemp and copy the file contents), update the code to run sed on that temp path (replace references to egressFirewall with the temp file variable used in sedCmd), run exec.Command on the temp file, and ensure the temp file is removed after the test; reference the sedCmd construction and egressFirewall variable in dns.go and add temp file creation/cleanup around the sed execution.tests-extension/test/e2e/route-admission.go-169-174 (1)
169-174:⚠️ Potential issue | 🟠 MajorPotential stale pod reference after rolling update.
After patching
namespaceOwnershiptoStricton line 170, the test reads the env fromrouterpod(line 173) which was captured before the patch. If the patch triggers a pod restart,routerpodmay no longer exist or may be terminating. Should usenewRouterpodor re-fetch the pod name.Proposed fix
exutil.By("5. Patch the custom ingress controller and set namespaceOwnership to Strict") patchResourceAsAdmin(oc, ingctrl.namespace, "ingresscontrollers/"+ingctrl.name, "{\"spec\":{\"routeAdmission\":{\"namespaceOwnership\":\"Strict\"}}}") exutil.By("6. Check the ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK env variable, which should be false") -namespaceOwnershipEnv = readRouterPodEnv(oc, routerpod, "ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK") +// Re-fetch pod name since Strict is default and may not trigger new rollout +currentRouterpod := getOneNewRouterPodFromRollingUpdate(oc, ingctrl.name) +namespaceOwnershipEnv = readRouterPodEnv(oc, currentRouterpod, "ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/route-admission.go` around lines 169 - 174, The test patches namespaceOwnership via patchResourceAsAdmin which can trigger a router pod restart; however the code then reads environment from the previously captured routerpod via readRouterPodEnv, risking a stale/terminated reference. After calling patchResourceAsAdmin(oc, ingctrl.namespace, "ingresscontrollers/"+ingctrl.name, ...) re-fetch the current router pod name (use newRouterpod or call the same logic that populates newRouterpod) and then call readRouterPodEnv against that fresh pod name (rather than routerpod) to ensure you're checking the env on the current running pod.tests-extension/test/e2e/aws_util.go-54-56 (1)
54-56:⚠️ Potential issue | 🟠 MajorHardcoded AWS region will cause failures on non-us-west-2 clusters.
Both
newStsClient()andnewIamClient()hardcodeus-west-2in their AWS SDK configuration. Clusters deployed in other regions will fail because these clients operate in the wrong region. The calling functions (prepareAllForStsClusterandclearUpAlbOnStsCluster) receive the cluster CLI object and can retrieve the actual region usingexutil.GetAWSClusterRegion(oc), which is already used elsewhere in the codebase.Also applies to: lines 80-82
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/aws_util.go` around lines 54 - 56, newStsClient and newIamClient currently hardcode "us-west-2" in the AWS SDK config; change them to accept or derive the correct region from the cluster CLI object used by their callers (prepareAllForStsCluster and clearUpAlbOnStsCluster) by calling exutil.GetAWSClusterRegion(oc) and pass that region into config.LoadDefaultConfig (also update the similar hardcoded region usage at lines 80-82). Ensure the region value is validated/not empty before creating the config so the STS and IAM clients are created in the cluster's actual region.tests-extension/test/e2e/route-hsts.go-198-200 (1)
198-200:⚠️ Potential issue | 🟠 MajorRestore the cluster HSTS policy instead of deleting it.
These patches target the top-level
/spec/requiredHSTSPoliciesfield, and the deferred cleanup deletes that field entirely. If the cluster already has HSTS policies, these specs wipe them out for the duration of the run and then leave them gone, which can leak state into unrelated tests. Snapshot the current value and restore that exact payload indefer.Also applies to: 210-211, 263-265, 320-323, 332-333, 385-387, 442-444, 530-532, 584-585
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/route-hsts.go` around lines 198 - 200, The deferred cleanup currently removes the entire /spec/requiredHSTSPolicies which can clobber existing cluster HSTS settings; change the logic around patchGlobalResourceAsAdmin and its defer so you first GET (snapshot) the current value of spec.requiredHSTSPolicies (using the same oc client/context), store that JSON payload, then in the defer call PATCH/PUT that exact snapshot back instead of issuing a remove operation; update all locations using patchGlobalResourceAsAdmin (the test functions around ingctldomain) to capture and restore the original payload rather than deleting the field.tests-extension/test/e2e/route-hsts.go-159-159 (1)
159-159:⚠️ Potential issue | 🟠 MajorAdd
g.Serialdecorator to these test specs.Tests at lines 159, 224, 281, 346, 403, 463, and 551 use textual
[Serial]or[Disruptive]tags in their names, but under Ginkgo v2, these markers have no effect without the correspondingg.Serialorg.Ordereddecorator. The OTE entrypoint only processes platform-related tags and does not handle serialization markers.These specs patch cluster-scoped resources (ingress config and ingresscontrollers) and must run serially to avoid conflicts. Add the
g.Serialdecorator to each:- g.It("Author:aiyengar-NonHyperShiftHOST-Critical-43474-The includeSubDomainsPolicy parameter can configure subdomain policy to inherit the HSTS policy of parent domain [Serial]", func() { + g.It("Author:aiyengar-NonHyperShiftHOST-Critical-43474-The includeSubDomainsPolicy parameter can configure subdomain policy to inherit the HSTS policy of parent domain [Serial]", g.Serial, func() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/route-hsts.go` at line 159, The listed Ginkgo test specs (the g.It calls with titles like "Author:aiyengar-NonHyperShiftHOST-Critical-43474-The includeSubDomainsPolicy parameter..." and the other specs at the noted locations) are only marked with "[Serial]" or "[Disruptive]" in their names but lack the g.Serial decorator; update each affected spec by adding the g.Serial decorator to the corresponding g.It invocation (e.g., change g.It(...) to g.It(..., g.Serial) or wrap the spec with g.Describe/Ginkgo decorator usage that includes g.Serial) so the tests run serially when patching cluster-scoped resources like ingressconfig/ingresscontrollers. Ensure you add g.Serial to every spec referenced (lines 159, 224, 281, 346, 403, 463, 551) and keep the original test bodies intact.tests-extension/test/e2e/route-hsts.go-217-219 (1)
217-219:⚠️ Potential issue | 🟠 MajorRemove the shell quotes from the
hsts_headerannotation arguments.The unquoted values at lines 39, 47, 55, and 209 work correctly, but lines 217, 339, 450, 455, 538, and 543 include single quotes that become part of the literal annotation value. Since
Args()passes raw argv without shell parsing, the validation receives'max-age=...'(with literal quotes) instead ofmax-age=..., preventing proper HSTS policy enforcement checks.Representative fix
- msg2, err := oc.Run("annotate").WithoutNamespace().Args("route/route-edge", "haproxy.router.openshift.io/hsts_header='max-age=50000'", "--overwrite").Output() + msg2, err := oc.Run("annotate").WithoutNamespace().Args("route/route-edge", "haproxy.router.openshift.io/hsts_header=max-age=50000", "--overwrite").Output()Also applies to: 339, 450, 455, 538, 543
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/route-hsts.go` around lines 217 - 219, The annotation arguments passed to oc.Run("annotate").WithoutNamespace().Args(...) include literal single quotes around the hsts_header value (e.g., "haproxy.router.openshift.io/hsts_header='max-age=50000'"), which makes the quotes part of the annotation; remove the single quotes so the Args entry is haproxy.router.openshift.io/hsts_header=max-age=50000 (and likewise for other test calls to oc.Run("annotate").Args that set haproxy.router.openshift.io/hsts_header) so validation receives the raw max-age value; update the occurrences shown (the annotated Args calls for route/route-edge and the other similar test cases) accordingly.
🟡 Minor comments (6)
tests-extension/test/e2e/tuning.go-193-197 (1)
193-197:⚠️ Potential issue | 🟡 MinorInconsistent time format assertion on line 196.
The assertion for
ROUTER_DEFAULT_SERVER_TIMEOUTis missing the "s" suffix, unlike the other timeout assertions. WhileContainSubstring("33")will match"33s", it's inconsistent and less precise than the other assertions.Proposed fix
o.Expect(checkenv).To(o.ContainSubstring(`ROUTER_CLIENT_FIN_TIMEOUT=3s`)) o.Expect(checkenv).To(o.ContainSubstring(`ROUTER_DEFAULT_CLIENT_TIMEOUT=33s`)) - o.Expect(checkenv).To(o.ContainSubstring(`ROUTER_DEFAULT_SERVER_TIMEOUT=33`)) + o.Expect(checkenv).To(o.ContainSubstring(`ROUTER_DEFAULT_SERVER_TIMEOUT=33s`)) o.Expect(checkenv).To(o.ContainSubstring(`ROUTER_DEFAULT_SERVER_FIN_TIMEOUT=3s`))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/tuning.go` around lines 193 - 197, The assertion checking ROUTER_DEFAULT_SERVER_TIMEOUT is missing the "s" suffix; update the expectation that uses readRouterPodEnv / checkenv to assert the full unit (e.g., "ROUTER_DEFAULT_SERVER_TIMEOUT=33s") instead of "33" so it matches the other timeout assertions and is precise.tests-extension/test/e2e/dns-operator.go-192-200 (1)
192-200:⚠️ Potential issue | 🟡 MinorTypos in step descriptions.
"updateh" should be "update" on lines 192 and 197.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/dns-operator.go` around lines 192 - 200, The test step descriptions contain typos: change the strings passed to exutil.By from "updateh the custom zones policy to RoundRobin, check Corefile and ensure it is updated " and "updateh the custom zones policy to Sequential, check Corefile and ensure it is updated" to use "update" instead of "updateh"; update the two occurrences surrounding the calls to patchGlobalResourceAsAdmin, pollReadDnsCorefile and o.Expect so the human-readable test output (exutil.By) reads "update the custom zones policy..." for both RoundRobin and Sequential cases.tests-extension/test/e2e/ingress.go-217-219 (1)
217-219:⚠️ Potential issue | 🟡 MinorMinor typo in log message.
"skiping" should be "skipping" (missing 'p').
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/ingress.go` around lines 217 - 219, Fix the typo in the log message inside the conditional that calls checkProxy(oc): update the e2e.Logf call that currently logs "This is proxy cluster, skiping the curling part." to use "skipping" (i.e., "This is proxy cluster, skipping the curling part.") so the message reads correctly; locate this in the block where checkProxy(oc) returns true and adjust the string passed to e2e.Logf.tests-extension/test/e2e/dns-operator.go-95-96 (1)
95-96:⚠️ Potential issue | 🟡 MinorExtra closing brace in JSONPath may cause parsing issues.
The JSONPath string has an extra
}at the end:...message}}should be...message}.Proposed fix
- outputOpcfg, errOpcfg := oc.AsAdmin().WithoutNamespace().Run("get").Args( - "dns.operator", "default", `-o=jsonpath={.status.conditions[?(@.type=="Degraded")].message}}`).Output() + outputOpcfg, errOpcfg := oc.AsAdmin().WithoutNamespace().Run("get").Args( + "dns.operator", "default", `-o=jsonpath={.status.conditions[?(@.type=="Degraded")].message}`).Output()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/dns-operator.go` around lines 95 - 96, The JSONPath passed to oc.AsAdmin().WithoutNamespace().Run("get").Args(...) has an extra closing brace causing parsing errors; update the Args call that sets `-o=jsonpath=...` (used when building outputOpcfg/errOpcfg) to remove the extra `}` so the string ends with `...message}` instead of `...message}}`, then run the command to verify Output()/err handling remains correct.tests-extension/test/e2e/cloud_load_balancer.go-237-238 (1)
237-238:⚠️ Potential issue | 🟡 MinorIncorrect fmt.Sprintf usage - missing argument.
fmt.Sprintfis called with%splaceholder but the error variable is passed as the argument toexutil.AssertWaitPollNoErr, not toSprintf. The message will contain literal%s.Proposed fix
- exutil.AssertWaitPollNoErr(err, fmt.Sprintf("reached max time allowed but the error message doesn't appear", err)) + exutil.AssertWaitPollNoErr(err, "reached max time allowed but the error message doesn't appear")Or if the error should be included:
- exutil.AssertWaitPollNoErr(err, fmt.Sprintf("reached max time allowed but the error message doesn't appear", err)) + exutil.AssertWaitPollNoErr(err, fmt.Sprintf("reached max time allowed but the error message doesn't appear: %v", err))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/cloud_load_balancer.go` around lines 237 - 238, The fmt.Sprintf call used inside the exutil.AssertWaitPollNoErr invocation is missing the error argument, so the format string will contain a literal %s; update the call that constructs the message (the fmt.Sprintf used before exutil.AssertWaitPollNoErr) to pass the err variable into fmt.Sprintf (or remove fmt.Sprintf and pass a single formatted string created with fmt.Sprintf including err) so the actual error is included in the assertion message; look for the fmt.Sprintf and exutil.AssertWaitPollNoErr usage around the err variable and ensure the formatted string receives err as an argument.tests-extension/test/e2e/aws_util.go-266-280 (1)
266-280:⚠️ Potential issue | 🟡 MinorEIP allocation lacks cleanup on partial failure.
If
allocateElaticIPfails partway through the loop (e.g., on the 3rd allocation of 4), the previously allocated EIPs (elements 0-2) are returned but may not be properly tracked for cleanup if the caller's defer hasn't captured them yet.Suggested improvement
func allocateElaticIP(oc *exutil.CLI, num int) []string { var eipAllocationsList []string clusterinfra.GetAwsCredentialFromCluster(oc) mySession := session.Must(session.NewSession()) svc := ec2.New(mySession) for i := 0; i < num; i++ { allocRes, err := svc.AllocateAddress(&ec2.AllocateAddressInput{Domain: aws.String("vpc")}) - o.Expect(err).NotTo(o.HaveOccurred()) + if err != nil { + // Cleanup already allocated EIPs before failing + if len(eipAllocationsList) > 0 { + ensureReleaseElaticIP(oc, &eipAllocationsList) + } + o.Expect(err).NotTo(o.HaveOccurred()) + } e2e.Logf("The eip allocation details for the %v element is %v", i, allocRes) eipAllocationsList = append(eipAllocationsList, *allocRes.AllocationId) } return eipAllocationsList }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/e2e/aws_util.go` around lines 266 - 280, The allocateElaticIP function can leak earlier allocations if an AllocateAddress call fails mid-loop; change the error path so that on any allocation error you immediately release any AllocationIds already appended to eipAllocationsList before failing the test. In practice, inside allocateElaticIP, after a failed svc.AllocateAddress call (or before o.Expect fails), iterate eipAllocationsList and call svc.ReleaseAddress(&ec2.ReleaseAddressInput{AllocationId: aws.String(id)}) for each id (log but ignore individual release errors), then fail (keep the existing o.Expect or return an error if you prefer API change). Reference symbols: allocateElaticIP, eipAllocationsList, svc, allocRes, and use ec2.ReleaseAddressInput with AllocationId to perform cleanup.
| buildPruningBaseDir := testdata.FixturePath("router", "extdns") | ||
| awsStsCredSecret := filepath.Join(buildPruningBaseDir, "aws-sts-creds-secret.yaml") | ||
| updateFilebySedCmd(awsStsCredSecret, "external-dns-role-arn", exDnsRoleArn) | ||
|
|
||
| _, err := oc.AsAdmin().WithoutNamespace().Run("create").Args("-f", awsStsCredSecret).Output() | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| // verify the secret creation | ||
| output := getByJsonPath(oc, ns, "secret", "{.items[*].metadata.name}") | ||
| o.Expect(output).Should(o.ContainSubstring("aws-sts-creds")) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how updateFilebySedCmd is implemented and if there's cleanup logic
ast-grep --pattern 'func updateFilebySedCmd($_, $_, $_) {
$$$
}'
# Check for other usages of this pattern
rg -n 'updateFilebySedCmd' --type=goRepository: openshift/router
Length of output: 1394
🏁 Script executed:
sed -n '2207,2230p' tests-extension/test/e2e/util.goRepository: openshift/router
Length of output: 1508
🏁 Script executed:
rg -A 10 'func updateFilebySedCmd' tests-extension/test/e2e/util.goRepository: openshift/router
Length of output: 595
🏁 Script executed:
rg -B 3 -A 10 'func FixturePath' --type=goRepository: openshift/router
Length of output: 2074
🏁 Script executed:
rg -B 5 'fixtureDir' tests-extension/test/e2e/testdata/fixtures.go | head -30Repository: openshift/router
Length of output: 458
🏁 Script executed:
rg -n 'BeforeEach|AfterEach|Setup|Teardown|DeferCleanup' tests-extension/test/e2e/externaldns_util.goRepository: openshift/router
Length of output: 42
🏁 Script executed:
sed -n '80,110p' tests-extension/test/e2e/externaldns_util.goRepository: openshift/router
Length of output: 1516
🏁 Script executed:
rg -n 'CleanupFixtures' tests-extension/test/e2e/Repository: openshift/router
Length of output: 219
🏁 Script executed:
rg -n 'parallel|Parallel|PARALLEL' tests-extension/test/e2e/ --type=go | head -20Repository: openshift/router
Length of output: 42
🏁 Script executed:
rg -B 5 'awsStsCredSecret|aws-sts-creds-secret' tests-extension/test/e2e/ --type=goRepository: openshift/router
Length of output: 3462
🏁 Script executed:
rg -r 'CleanupFixtures\(\)' tests-extension/ --type=goRepository: openshift/router
Length of output: 3003
🏁 Script executed:
find tests-extension -name "*.go" -type f | xargs grep -l "ginkgo\|testing.M" | head -5Repository: openshift/router
Length of output: 251
🏁 Script executed:
rg -B 10 'func init\(\)' tests-extension/test/e2e/testdata/fixtures.goRepository: openshift/router
Length of output: 146
🏁 Script executed:
rg 'CleanupFixtures\(' tests-extension/ --type=go -A 2 -B 2 | grep -v '^--$'Repository: openshift/router
Length of output: 696
🏁 Script executed:
cat tests-extension/cmd/main.go | head -100Repository: openshift/router
Length of output: 3143
Fixture files are modified in-place without isolation, causing test interference and race conditions.
updateFilebySedCmd uses sed -i to modify fixture files directly on disk in a shared temporary directory. Since CleanupFixtures() is never invoked between tests, modifications persist across test runs. Multiple tests calling updateFilebySedCmd on the same fixture files will cause race conditions in parallel execution and leave fixtures in altered states when tests fail.
Create a temporary copy of the fixture file before modification, or use template substitution at runtime instead of sed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests-extension/test/e2e/externaldns_util.go` around lines 90 - 98, The test
mutates shared fixture files via updateFilebySedCmd (which uses sed -i) causing
inter-test interference; fix by making a temporary copy of the fixture before
modifying it (use testdata.FixturePath("router","extdns") to locate the
original), run updateFilebySedCmd against the temp copy, and pass that temp file
(not the original aws-sts-creds-secret.yaml) to
oc.AsAdmin().WithoutNamespace().Run("create").Args("-f", tempFile).Output();
ensure the temp file is removed in test cleanup (or call CleanupFixtures()) so
fixtures remain immutable for other tests.
| "ports": [ | ||
| { | ||
| "containerPort": 8080, | ||
| "securityContext": { | ||
| "privileged": "true" | ||
| } | ||
| } | ||
| ] |
There was a problem hiding this comment.
Invalid schema: securityContext placed inside ports array.
The securityContext is incorrectly nested under each port definition. Kubernetes does not support securityContext at the port level—it must be defined at the container level. Additionally, privileged should be a boolean, not a string.
The current structure will be silently ignored, and containers will not run privileged as intended.
🐛 Proposed fix
"containers": [{
"name": "httpbin-http",
"image": "quay.io/openshifttest/httpbin@sha256:cc44fbd857f4148d8aad8359acc03efa719517e01d390b152e4f3830ad871c9f",
+ "securityContext": {
+ "privileged": true
+ },
"ports": [
{
- "containerPort": 8080,
- "securityContext": {
- "privileged": "true"
- }
+ "containerPort": 8080
}
]
},
{
"name": "httpbin-https",
"image": "quay.io/openshifttest/httpbin@sha256:f57f4e682e05bcdadb103c93ae5ab9be166f79bcbbccaf45d92a2cad18da8d64",
+ "securityContext": {
+ "privileged": true
+ },
"ports": [
{
- "containerPort": 8443,
- "securityContext": {
- "privileged": "true"
- }
+ "containerPort": 8443
}
]
}]Also applies to: 26-33
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests-extension/test/e2e/testdata/router/httpbin-pod-withprivilege.json`
around lines 14 - 21, Move the misplaced securityContext out of the port object
into the parent container object and change privileged from a string to a
boolean; specifically, remove the "securityContext" block currently inside the
"ports" array (the object containing "containerPort": 8080 and
"securityContext") and add a "securityContext" entry at the same level as
"ports" inside the container definition, setting "privileged": true (boolean) so
the container-level securityContext is applied correctly.
|
@ming1013: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
========================================
🎉 OTE Migration Complete!
Summary
Successfully migrated router to OTE framework using single-module strategy.
Created Structure
Directory tree:
Key Features
Next Steps
1. Verify Build