Skip to content

feat(CC-0016): G002: Implement Keystone Chainsaw E2E test suite#76

Merged
berendt merged 1 commit intomainfrom
feature/CC-0016
Mar 22, 2026
Merged

feat(CC-0016): G002: Implement Keystone Chainsaw E2E test suite#76
berendt merged 1 commit intomainfrom
feature/CC-0016

Conversation

@berendt
Copy link
Contributor

@berendt berendt commented Mar 22, 2026

Size: 🏗️ large
Category: infrastructure
Priority: medium
Feature ID: CC-0016 (S016)


Open Questions

No open questions — both decisions resolved by user:

  1. missing-secret/ isolation → Use unique secretRef names per test CR
  2. CR name collisions with parallel: 4 → Unique CR name per test directory

Summary

Create 9 Chainsaw E2E test suites under tests/e2e/keystone/ validating the Keystone operator's full lifecycle on a real cluster. Each suite targets a specific operational scenario: happy-path deployment, secret dependency recovery, Fernet key rotation, replica scaling, garbage collection on deletion, policy override injection, middleware pipeline customization, brownfield (unmanaged) database mode, and rolling image upgrades. Together they cover the operator's condition progression (SecretsReadyDatabaseReadyFernetKeysReadyDeploymentReadyBootstrapReadyReady), owned resource management, and resilience to operational changes.


Scope

Included:

  • basic-deployment/ — Apply valid Keystone CR (managed mode with clusterRef), assert all 5 sub-conditions progress to True and aggregate Ready=True (reason: AllReady), verify Deployment {name}-api with availableReplicas > 0, Service {name}-api on port 5000, immutable ConfigMap {name}-config-{hash}, CronJob {name}-fernet-rotate, fernet Secret {name}-fernet-keys, RBAC resources (ServiceAccount/Role/RoleBinding {name}-fernet-rotate), PushSecret {name}-fernet-keys-backup. Single script step with curl http://{name}-api.openstack.svc:5000/v3 to verify functional API response.
  • missing-secret/ — Apply Keystone CR referencing non-existent Secret names (unique per test, e.g. missing-secret-keystone-db), assert SecretsReady=False (reason: WaitingForDBCredentials or WaitingForAdminCredentials), create prerequisite Secrets, assert recovery to Ready=True
  • fernet-rotation/ — Verify CronJob {name}-fernet-rotate schedule matches spec.fernet.rotationSchedule, trigger manual rotation via kubectl create job --from=cronjob/{name}-fernet-rotate, assert {name}-fernet-keys Secret .data changes, assert Deployment pod-template annotation keystone.c5c3.io/fernet-keys-hash updates (rolling restart)
  • scale/ — Apply Keystone CR with replicas: 3, patch to replicas: 5, assert Deployment.spec.replicas and status.availableReplicas update, patch to replicas: 2, assert scale-down
  • deletion-cleanup/ — Deploy Keystone CR, wait for Ready=True, delete CR, assert all owned resources return NotFound via error checks: Deployment, Service, ConfigMap, CronJob, Jobs (db-sync, bootstrap), Secret (fernet-keys), ServiceAccount, Role, RoleBinding, PushSecret
  • policy-overrides/ — Apply Keystone CR with policyOverrides.configMapRef, assert generated ConfigMap contains policy.yaml data key, assert keystone.conf contains [oslo_policy] section with policy_file = /etc/keystone/policy.yaml
  • middleware-config/ — Apply Keystone CR with custom middleware entries (name, filterFactory, position), assert api-paste.ini key in generated ConfigMap contains the modified pipeline filter references
  • brownfield-database/ — Apply Keystone CR with explicit database.host: openstack-db.openstack.svc.cluster.local and database.port: 3306 (no clusterRef), assert NO databases.k8s.mariadb.com or users.k8s.mariadb.com or grants.k8s.mariadb.com CRs are created (via error checks), assert keystone.conf [database].connection contains the explicit host
  • image-upgrade/ — Apply Keystone CR, wait for Ready=True, patch spec.image.tag, assert Deployment container image updates, assert Ready=True maintained after rollout completes

Excluded:

  • Performance/stress tests (S024 scope)
  • ControlPlane/c5c3-operator integration (S022/S023 scope)
  • TLS certificate validation (not in S016 spec)
  • Federation tests (no federation reconciler yet)
  • Webhook validation tests (already covered by invalid-cr/ in CC-0012)
  • Cache brownfield testing (not in S016 spec; only database brownfield requested)

Visualization

flowchart TD
    subgraph E2E["tests/e2e/keystone/"]
        BD["basic-deployment/"]
        MS["missing-secret/"]
        FR["fernet-rotation/"]
        SC["scale/"]
        DC["deletion-cleanup/"]
        PO["policy-overrides/"]
        MC["middleware-config/"]
        BF["brownfield-database/"]
        IU["image-upgrade/"]
    end

    subgraph Conditions["Condition Coverage"]
        SR["SecretsReady"]
        DR["DatabaseReady"]
        FKR["FernetKeysReady"]
        DPR["DeploymentReady"]
        BR["BootstrapReady"]
        RDY["Ready"]
    end

    BD --> SR & DR & FKR & DPR & BR & RDY
    MS --> SR
    FR --> FKR
    SC --> DPR
    DC --> RDY
    PO --> DR & DPR
    MC --> DPR
    BF --> DR & DPR
    IU --> DPR & RDY
Loading
sequenceDiagram
    participant CH as Chainsaw
    participant K8s as Kubernetes API
    participant KR as Keystone Reconciler
    participant OW as Owned Resources

    Note over CH: basic-deployment
    CH->>K8s: apply Keystone CR + prerequisite Secrets
    K8s->>KR: reconcile
    KR->>OW: create fernet Secret, CronJob, RBAC, ConfigMap, MariaDB CRs, db-sync Job, Deployment, Service, bootstrap Job, PushSecret
    CH->>K8s: assert conditions SecretsReady...Ready = True
    CH->>K8s: assert Deployment availableReplicas > 0
    CH->>K8s: script curl /v3

    Note over CH: missing-secret
    CH->>K8s: apply Keystone CR with unique secretRef names
    CH->>K8s: assert SecretsReady = False
    CH->>K8s: create prerequisite Secrets
    CH->>K8s: assert Ready = True

    Note over CH: fernet-rotation
    CH->>K8s: apply Keystone CR, wait Ready
    CH->>K8s: record fernet Secret .data hash
    CH->>K8s: script kubectl create job --from=cronjob
    CH->>K8s: assert fernet Secret .data changed
    CH->>K8s: assert Deployment annotation changed

    Note over CH: deletion-cleanup
    CH->>K8s: delete Keystone CR
    CH->>K8s: error assert Deployment not found
    CH->>K8s: error assert Service not found
    CH->>K8s: error assert CronJob not found
Loading
stateDiagram-v2
    [*] --> SecretsReady: ESO Secrets synced
    SecretsReady --> DatabaseReady: MariaDB CRs + db_sync Job
    DatabaseReady --> FernetKeysReady: Fernet Secret + CronJob + PushSecret
    FernetKeysReady --> DeploymentReady: Deployment available + endpoint set
    DeploymentReady --> BootstrapReady: Bootstrap Job complete
    BootstrapReady --> Ready: All conditions True

    state "missing-secret test" as ms
    SecretsReady --> ms: SecretsReady=False path

    state "fernet-rotation test" as fr
    FernetKeysReady --> fr: Key rotation path

    state "scale test" as sc
    DeploymentReady --> sc: Replica changes
Loading

Key Components

  • basic-deployment/chainsaw-test.yaml + fixture CR + prerequisite Secrets — Happy-path test asserting full condition progression (SecretsReadyDatabaseReadyFernetKeysReadyDeploymentReadyBootstrapReadyReady=True with reason AllReady), resource existence checks (Deployment {name}-api, Service, ConfigMap, CronJob, fernet Secret, RBAC, PushSecret), and a script step with curl to verify HTTP response from /v3
  • missing-secret/chainsaw-test.yaml + fixture CR + late-created Secrets — CR references unique Secret names (e.g. missing-secret-keystone-db, missing-secret-keystone-admin). Tests SecretsReady=False when referenced Secrets don't exist, asserts reason is WaitingForDBCredentials or WaitingForAdminCredentials, then creates Secrets and asserts recovery to Ready=True
  • fernet-rotation/chainsaw-test.yaml + fixture CR — Verifies CronJob schedule field matches spec, triggers rotation via kubectl create job --from=cronjob/{name}-fernet-rotate in a script step, asserts {name}-fernet-keys Secret .data changes, asserts Deployment pod-template annotation keystone.c5c3.io/fernet-keys-hash value changes (triggering rolling restart)
  • scale/chainsaw-test.yaml + fixture CR + patch files — Three-step test (3→5→2) using Chainsaw patch steps on spec.replicas, asserting Deployment.spec.replicas and status.availableReplicas at each step
  • deletion-cleanup/chainsaw-test.yaml + fixture CR — Deploys Keystone CR, waits for Ready=True, deletes CR, asserts all owned resources return NotFound via Chainsaw error checks: Deployment, Service, ConfigMap, CronJob, Jobs (db-sync, bootstrap), Secret (fernet-keys), ServiceAccount, Role, RoleBinding, PushSecret
  • policy-overrides/chainsaw-test.yaml + fixture CR + policy ConfigMap — Applies Keystone CR with policyOverrides.configMapRef, asserts generated ConfigMap contains policy.yaml data key and keystone.conf contains [oslo_policy] section with policy_file
  • middleware-config/chainsaw-test.yaml + fixture CR — Applies Keystone CR with custom middleware entries (specifying name, filterFactory, position), asserts api-paste.ini in generated ConfigMap contains the modified pipeline filter references
  • brownfield-database/chainsaw-test.yaml + fixture CR — Applies Keystone CR with database.host/database.port (no clusterRef), asserts NO databases.k8s.mariadb.com, users.k8s.mariadb.com, or grants.k8s.mariadb.com CRs exist via error checks, asserts keystone.conf [database].connection contains the explicit host. Uses the existing openstack-db MariaDB instance.
  • image-upgrade/chainsaw-test.yaml + fixture CR + patch file — Applies Keystone CR, waits for Ready=True, patches spec.image.tag via Chainsaw patch step, asserts Deployment container image updates, asserts Ready=True maintained after rollout completes
  • Shared conventions — Each test directory contains its own Keystone CR YAML with a unique CR name (e.g., keystone-basic, keystone-scale). Each test creates its own prerequisite Secrets with test-unique names. Brownfield tests use host/port; managed tests use clusterRef pointing to openstack-db/openstack-memcached. Assertions follow existing patterns from infra-stack-health and invalid-cr: CEL expressions with backtick-quoted numerics (`0`), JMESPath condition filters (conditions[?type == 'Ready']), and extended assert timeout (5m) for full reconciliation cycles.

Note on condition order: Research confirms the sub-reconciler execution order is reconcileSecretsreconcileDatabasereconcileFernetKeysreconcileConfigreconcileDeploymentreconcileBootstrap (per keystone_controller.go subConditionTypes). The user's original elaboration had FernetKeysReady before DatabaseReady — corrected above to match the actual code.

berendt added a commit that referenced this pull request Mar 22, 2026
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@berendt
Copy link
Contributor Author

berendt commented Mar 22, 2026

Reviewed by planwerk-review 2f0622d with Claude CLI

BLOCKING (1)

B-001: 2025.2-upgraded image tag not built/loaded in CI

File: tests/e2e/keystone/image-upgrade/01-patch-image.yaml:18
Actionability: needs-discussion
Fix: ASK

Problem: The image-upgrade test patches spec.image.tag to '2025.2-upgraded', but no CI workflow step builds, tags, or loads this image into the kind cluster. Without this image, the Deployment rollout will stall with ImagePullBackOff. The test's Step 4 script only checks the desired spec, not the actual running container image. Combined with replicas: 1 and default RollingUpdate strategy, the old pod remains available during the stalled rollout, so availableReplicas > 0 passes against the OLD pod — making this test either flaky (passes vacuously against old pods) or fails after 5m timeout.

Action Required: Add a CI step (e.g., docker tag and kind load docker-image for the 2025.2-upgraded tag) before the E2E tests run, or add an assertion in Step 4 that waits for status.updatedReplicas == spec.replicas to prove new pods actually started.


CRITICAL (0)

No findings.


WARNING (4)

W-001: All 9 tests share MySQL schema keystone — concurrent db_sync risk

File: tests/e2e/keystone/basic-deployment/00-keystone-cr.yaml
Pattern: Concurrent Resource Access
Actionability: needs-discussion
Fix: ASK

Problem: Every test CR specifies database: keystone in its spec. With parallel: 4, up to 4 concurrent keystone-manage db_sync Jobs will execute Alembic migrations against the same database simultaneously. Alembic advisory locking provides some protection, but concurrent DDL against the same schema is inherently fragile — lock contention can cause intermittent timeouts. The documentation claims tests are independent for parallel execution, which is true for Kubernetes resources but not for the shared database state.

Action Required: Use unique schema names per test (e.g., database: keystone_basic, database: keystone_scale). This requires the MariaDB instance to support multiple schemas, which it should since each test creates its own Database CR.

W-002: Sub-condition progression diagram contradicts subConditionTypes order

File: docs/reference/keystone-e2e-tests.md:454
Actionability: auto-fix
Fix: AUTO-FIX

Problem: The Sub-Condition Progression diagram shows SecretsReady → FernetKeysReady → reconcileConfig → DatabaseReady, matching the reconciler execution order. However, the subConditionTypes array in keystone_controller.go:31-37 lists DatabaseReady second and FernetKeysReady third. The diagram does not clarify whether it shows execution order vs display order, which will confuse operators viewing kubectl output where conditions appear in subConditionTypes array order.

Action Required: Add a note clarifying this is execution order, or reorder the diagram to match subConditionTypes.

W-003: Scale-down assertion passes before scale-down completes

File: tests/e2e/keystone/scale/chainsaw-test.yaml:73
Actionability: auto-fix
Fix: AUTO-FIX

Problem: After patching replicas from 5 to 2, Step 6 asserts availableReplicas >= 2. During scale-down, while 5 pods are still running, availableReplicas >= 2 is trivially true (5 >= 2). The assertion passes immediately without waiting for the actual scale-down to complete. The test proves the reconciler propagated spec.replicas: 2 but does NOT prove the scale-down actually happened.

Action Required: Use an exact equality assertion (availableReplicas == 2 via JMESPath) to verify the scale-down completed, not just that it started.

W-004: grep '2025.2' substring match would also match 2025.2-upgraded

File: tests/e2e/keystone/image-upgrade/chainsaw-test.yaml:39
Actionability: auto-fix
Fix: AUTO-FIX

Problem: The grep '2025.2' command verifies the initial image tag contains '2025.2'. But if a previous test run left the Deployment at '2025.2-upgraded' (stale state from incomplete cleanup), this assertion would still pass because '2025.2-upgraded' contains the substring '2025.2'.

Action Required: Use grep -x '.*:2025.2$' or grep '2025.2$' to anchor the match, ensuring the tag is exactly 2025.2 and not 2025.2-upgraded.


INFO (3)

I-001: deletion-cleanup does not assert MariaDB CR cleanup

File: tests/e2e/keystone/deletion-cleanup/chainsaw-test.yaml:45
Actionability: needs-discussion
Fix: ASK

Problem: Step 4 asserts 9 owned resources are deleted (Deployment, Service, CronJob, Secret, SA, Role, RoleBinding, PushSecret, Job). However, managed-mode Keystone CRs also create MariaDB Database, User, and Grant CRs. These are not included in the error assertions. If the reconciler sets owner references on MariaDB CRs, they should be garbage-collected and could be asserted. If not, this may be an intentional design choice for data retention.

Action Required: Either add MariaDB CR cleanup assertions if owner references are set, or document the intentional exclusion for data retention purposes.

I-002: missing-secret does not assert the condition reason

File: tests/e2e/keystone/missing-secret/chainsaw-test.yaml:22
Actionability: auto-fix
Fix: AUTO-FIX

Problem: Step 2 asserts SecretsReady has status: False but does not assert the reason field. The PR description claims the test asserts reason is WaitingForDBCredentials or WaitingForAdminCredentials, overstating the actual test coverage.

Action Required: Add a reason field assertion to match the PR description, or update the PR description to accurately reflect the test only checks status: False.

I-003: Condition order in docs matches execution order but not display order

File: docs/reference/keystone-e2e-tests.md:196
Actionability: auto-fix
Fix: AUTO-FIX

Problem: The basic-deployment Step 2 documentation lists conditions as SecretsReady, FernetKeysReady, DatabaseReady — matching execution order but not the subConditionTypes display order. This is consistent with the same issue identified in F3.

Action Required: Align condition order with subConditionTypes display order, or add a note clarifying the ordering convention used.


Summary

Category Count
BLOCKING 1
CRITICAL 0
WARNING 4
INFO 3

Recommendation: Do not merge until F1 (missing 2025.2-upgraded image in CI) is resolved. F2 (shared database schema) and F4 (scale-down assertion) should be fixed before merge to prevent flaky tests. F3 and F5 are straightforward fixes that should be included. Informational items can be addressed in a follow-up.

- Add 9 Chainsaw E2E test suites covering Keystone reconciler scenarios:
basic-deployment, missing-secret, fernet-rotation, scale,
deletion-cleanup, policy-overrides, middleware-config,
brownfield-database, and image-upgrade
- Validate full reconciliation cycles with JMESPath condition assertions
on all 5 sub-conditions (DatabaseReady, SecretsReady, FernetKeysReady,
BootstrapReady, DeploymentReady) and aggregate Ready condition
- Verify owned resource creation (Deployment, Service, CronJob, Secret,
RBAC, PushSecret, ConfigMap) and proper cleanup on CR deletion
- Test day-2 operations: replica scaling, image upgrades with rolling
restarts, and fernet key rotation with annotation hash changes
- Cover error recovery path where missing secrets cause degraded state
followed by reconciliation to Ready after late secret creation
- Test brownfield database mode with pre-existing credentials and
policy overrides via external ConfigMap references
- Validate middleware configuration propagation through the pipeline
config chain
- Add comprehensive reference documentation for all E2E test suites
in docs/reference/keystone-e2e-tests.md
- Update VitePress sidebar config and keystone-crd reference to
cross-link the new E2E test documentation
- Add Chainsaw E2E test structure pattern to .planwerk/patterns for
consistent JMESPath condition assertion conventions

AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
@berendt berendt merged commit 7c7c2ea into main Mar 22, 2026
5 checks passed
@berendt berendt deleted the feature/CC-0016 branch March 22, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant