feat(CC-0010): D003: Implement infrastructure deployment automation E2E#71
feat(CC-0010): D003: Implement infrastructure deployment automation E2E#71
Conversation
AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Level 1 tasks for infrastructure deployment automation: - 1.1: Create hack/kind-config.yaml with single control-plane node for CI runners (REQ-010, REQ-011) - 1.2: Create deploy/kind/base/kustomization.yaml referencing ../../flux-system/ with OpenBao standalone patch — HA disabled, 1 replica, Raft without retry_join, standard storage (REQ-003) - 1.3: Create deploy/kind/infrastructure/kustomization.yaml referencing ../../flux-system/infrastructure/ with MariaDB (1 replica, no Galera, no MaxScale, standard storage) and Memcached (1 replica) patches (REQ-003) Validated with kustomize build for both overlays. Production manifests remain unmodified — all kind differences are overlay-only. AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Add three shell scripts for infrastructure deployment automation: - hack/deploy-infra.sh: 8-step orchestration (kind cluster creation, FluxCD install, two-phase kustomize apply with health waits, OpenBao single-replica init/unseal and bootstrap, ExternalSecret sync wait). Configurable timeouts via HELMRELEASE_TIMEOUT, POD_TIMEOUT, and EXTERNALSECRET_TIMEOUT environment variables. Pre-flight checks for docker, existing cluster, and required CLI tools. - hack/teardown-infra.sh: idempotent kind cluster deletion. - hack/install-test-deps.sh: pinned installs of chainsaw, flux CLI, kind, and kubectl with version-aware skip logic. Level 2 tasks: 2.1 (REQ-001,004,005,011,012), 2.2 (REQ-002,011), 2.3 (REQ-006,011). AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Replace stub targets with real implementations (Level 3, task 3.1): - deploy-infra delegates to hack/deploy-infra.sh (REQ-001) - teardown-infra delegates to hack/teardown-infra.sh (REQ-002) - install-test-deps delegates to hack/install-test-deps.sh (REQ-006) - e2e runs chainsaw test against tests/e2e/ (REQ-007) AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Level 4 — Task 4.1: Create Chainsaw v1alpha2 Test at tests/e2e/infrastructure/infra-stack-health/chainsaw-test.yaml that asserts readiness of the full infrastructure stack: - Operator Deployments: cert-manager, external-secrets, mariadb-operator, memcached-operator (availableReplicas > 0) - OpenBao StatefulSet readiness (readyReplicas >= 1) - Infrastructure CRs: ClusterIssuer Ready, MariaDB CR Ready, Memcached CR Ready conditions - ESO resources: ClusterSecretStore Valid condition, ExternalSecrets SecretSynced for keystone-admin, keystone-db, mariadb-root-password Uses extended 5-minute assert timeout for operator startup. AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Level 5: Add e2e-infra job to CI workflow and reference documentation for infrastructure E2E deployment. - Add e2e-infra job to .github/workflows/ci.yaml with SHA-pinned actions (checkout, setup-go, helm/kind-action, fluxcd/flux2/action, upload-artifact), timeout-minutes: 20, no needs: dependency on lint/test (REQ-009) - Add SKIP_KIND_CREATE env var to hack/deploy-infra.sh to skip kind cluster creation when helm/kind-action pre-creates it - Add reference docs at docs/reference/infrastructure/ e2e-deployment.md covering Makefile targets, deployment sequence, kustomize overlay structure, environment variables, CI job description, and Chainsaw test assertions AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Mark task 6.1 (reference documentation for e2e-deployment.md) as done in the progress tracker. The documentation itself was committed in 999efad. Level 6 completed tasks: - 6.1 Write reference documentation for infrastructure E2E deployment covering all Makefile targets, deployment sequence, kustomize overlays, environment variables, prerequisites, and CI job description (REQ-001 through REQ-012) AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Verdict: NEEDS_CHANGES. One blocker (CI job missing chainsaw install), one critical (no checksum verification for downloaded binaries), two major issues (unnecessary secret delete creating race window, BAO_TOKEN exposure after bootstrap), and two minor findings. All checklists for code quality, architecture, DRY/YAGNI, fail-fast, and defensive coding pass. AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Verdict: APPROVED. All 6 issues from review 1 resolved: chainsaw installation added to CI, SHA256 checksum verification for binary downloads, unnecessary secret deletion removed, BAO_TOKEN unset after bootstrap, jq --arg for safe interpolation, PATH documented. Three minor observations remain (same-origin checksum limitation, unverified memcached operator name, variable naming clarity). AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Address all 6 issues from D003 review 1: - Add chainsaw install step to CI e2e-infra job (blocker) - Add SHA256 checksum verification for binary downloads with pinned hashes for flux, kind, kubectl (critical) - Remove unnecessary kubectl delete secret before apply to eliminate race window for init-keys Secret (major) - Unset BAO_TOKEN after bootstrap phase completes (major) - Use jq --arg for safe variable interpolation (minor) - Document PATH requirement in Quick Start section (minor) Also add shellcheck CI job for hack/*.sh scripts and diagnostic info dump on e2e-infra job failure for troubleshooting. AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
56e3b44 to
dd9b195
Compare
|
|
||
| if [[ -x "${target}" ]]; then | ||
| local got | ||
| got="$("${target}" version 2>/dev/null | grep -oP 'v[\d.]+' | head -1)" || true |
There was a problem hiding this comment.
🟡 Medium hack/install-test-deps.sh:126
grep -oP at lines 126, 165, 202, and 238 uses Perl-compatible regex, which the default BSD grep on macOS does not support. When the script runs on darwin, grep fails with "invalid option -- P", the || true suppresses the error but leaves got empty, and the version comparison incorrectly treats every installed binary as outdated — causing unnecessary redownloads and visible error spam. Replace -P with POSIX-compatible -E patterns.
+ got="$("${target}" version 2>/dev/null | grep -oE 'v[0-9.]+' | head -1)" || true🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file hack/install-test-deps.sh around line 126:
`grep -oP` at lines 126, 165, 202, and 238 uses Perl-compatible regex, which the default BSD grep on macOS does not support. When the script runs on darwin, grep fails with "invalid option -- P", the `|| true` suppresses the error but leaves `got` empty, and the version comparison incorrectly treats every installed binary as outdated — causing unnecessary redownloads and visible error spam. Replace `-P` with POSIX-compatible `-E` patterns.
e34b3e0 to
720f459
Compare
| ready=$(kubectl get pods -n "${namespace}" -l "${selector}" -o json 2>/dev/null \ | ||
| | jq '[.items[] | select(.status.conditions[]? | select(.type == "Ready" and .status == "True"))] | length' 2>/dev/null) || true | ||
|
|
||
| if [[ "${ready}" -eq "${total}" ]]; then |
There was a problem hiding this comment.
🟠 High hack/deploy-infra.sh:149
In wait_for_pods(), when the kubectl | jq pipeline fails, ready is empty and [[ "${ready}" -eq "${total}" ]] throws a bash syntax error "operand expected" instead of continuing the retry loop. Consider using ${ready:-0} to default to 0, matching the pattern already used on line 154.
- if [[ "${ready}" -eq "${total}" ]]; then
+ if [[ "${ready:-0}" -eq "${total}" ]]; then🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file hack/deploy-infra.sh around line 149:
In `wait_for_pods()`, when the `kubectl | jq` pipeline fails, `ready` is empty and `[[ "${ready}" -eq "${total}" ]]` throws a bash syntax error "operand expected" instead of continuing the retry loop. Consider using `${ready:-0}` to default to 0, matching the pattern already used on line 154.
720f459 to
4fe057d
Compare
| "id": "ISSUE-01", | ||
| "severity": "blocker", | ||
| "check_ids": ["C1", "FC4"], | ||
| "title": "CI e2e-infra job does not install chainsaw — test step will fail", |
There was a problem hiding this comment.
🟢 Low reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json:93
The review document at lines 93-97 claims the CI e2e-infra job never installs chainsaw and will fail with 'command not found'. However, .github/workflows/ci.yaml has an 'Install test dependencies' step (lines 80-82) that runs make install-test-deps, and hack/install-test-deps.sh contains an install_chainsaw() function. The fix for ISSUE-01 is already implemented. Merging this review document would add false documentation claiming a non-existent blocking bug.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file .planwerk/reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json around line 93:
The review document at lines 93-97 claims the CI e2e-infra job never installs chainsaw and will fail with 'command not found'. However, `.github/workflows/ci.yaml` has an 'Install test dependencies' step (lines 80-82) that runs `make install-test-deps`, and `hack/install-test-deps.sh` contains an `install_chainsaw()` function. The fix for ISSUE-01 is already implemented. Merging this review document would add false documentation claiming a non-existent blocking bug.
d910d3f to
041b612
Compare
|
|
||
| # Phase 1: cert-manager must be Ready before we can create TLS resources. |
There was a problem hiding this comment.
🟠 High hack/deploy-infra.sh:541
After wait_for_helmreleases reports cert-manager Ready, the script immediately applies ClusterIssuer and Certificate resources (lines 541-542). HelmRelease Ready only signals the Helm install finished, not that the cert-manager webhook pod is operational. Since cert-manager registers a ValidatingWebhookConfiguration for these resources, kubectl apply fails with webhook validation errors if the webhook pod isn't ready. This is a documented cert-manager race condition.
- log "Phase 2: Applying TLS prerequisites (ClusterIssuer + OpenBao TLS Certificate)..."
- kubectl apply -f "${REPO_ROOT}/deploy/flux-system/infrastructure/cluster-issuer.yaml"
- kubectl apply -f "${REPO_ROOT}/deploy/flux-system/infrastructure/openbao-tls-cert.yaml"
+ log "Phase 2: Waiting for cert-manager webhook to be Ready..."
+ wait_for_pods "cert-manager" "app.kubernetes.io/component=webhook" "${POD_TIMEOUT}"
+
+ log "Phase 2: Applying TLS prerequisites (ClusterIssuer + OpenBao TLS Certificate)..."
+ kubectl apply -f "${REPO_ROOT}/deploy/flux-system/infrastructure/cluster-issuer.yaml"
+ kubectl apply -f "${REPO_ROOT}/deploy/flux-system/infrastructure/openbao-tls-cert.yaml"🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file hack/deploy-infra.sh around lines 541-542:
After `wait_for_helmreleases` reports cert-manager Ready, the script immediately applies `ClusterIssuer` and `Certificate` resources (lines 541-542). HelmRelease Ready only signals the Helm install finished, not that the cert-manager webhook pod is operational. Since cert-manager registers a ValidatingWebhookConfiguration for these resources, `kubectl apply` fails with webhook validation errors if the webhook pod isn't ready. This is a documented cert-manager race condition.
| "id": "ISSUE-02", | ||
| "severity": "critical", | ||
| "check_ids": ["S6"], | ||
| "title": "No checksum verification for downloaded binaries in install-test-deps.sh", |
There was a problem hiding this comment.
🟢 Low reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json:102
The security finding incorrectly states that downloaded binaries lack SHA256 verification. The install-test-deps.sh script already includes FLUX_SHA256, KIND_SHA256, and KUBECTL_SHA256 associative arrays with pinned hashes (lines 23-43), a verify_sha256() function (lines 92-109), and each install function calls verify_sha256() after download. For chainsaw, the script downloads and verifies against upstream checksums.txt. No changes needed — the verification is already implemented correctly.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file .planwerk/reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json around line 102:
The security finding incorrectly states that downloaded binaries lack SHA256 verification. The `install-test-deps.sh` script already includes `FLUX_SHA256`, `KIND_SHA256`, and `KUBECTL_SHA256` associative arrays with pinned hashes (lines 23-43), a `verify_sha256()` function (lines 92-109), and each install function calls `verify_sha256()` after download. For chainsaw, the script downloads and verifies against upstream `checksums.txt`. No changes needed — the verification is already implemented correctly.
| "check_ids": ["C3"], | ||
| "title": "Unnecessary delete-before-apply creates race window for init-keys Secret", | ||
| "description": "The openbao_init_unseal function runs 'kubectl delete secret' (line 289) followed by 'kubectl apply' (line 295). If the script is interrupted between the delete and the apply, the init output (root token + unseal keys) is lost permanently. kubectl apply alone handles both creation and update, making the delete unnecessary.", | ||
| "location": "hack/deploy-infra.sh:289-291", |
There was a problem hiding this comment.
🟢 Low reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json:113
The review comment in the document claims kubectl delete secret precedes kubectl apply at lines 289-291, but the actual openbao_init_unseal function uses kubectl apply -f - directly with no delete operation. The reported race window does not exist in the code.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file .planwerk/reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json around line 113:
The review comment in the document claims `kubectl delete secret` precedes `kubectl apply` at lines 289-291, but the actual `openbao_init_unseal` function uses `kubectl apply -f -` directly with no delete operation. The reported race window does not exist in the code.
fb518d4 to
4ef3cda
Compare
| local init_output | ||
| init_output=$(kubectl get secret "${SECRET_NAME}" \ | ||
| -n "${OPENBAO_NAMESPACE}" \ | ||
| -o jsonpath='{.data.init-output}' | base64 -d) |
There was a problem hiding this comment.
🟡 Medium hack/deploy-infra.sh:433
base64 -d fails on macOS with "invalid option -- d" because BSD base64 requires -D for decoding. This breaks the OpenBao unseal and bootstrap phases when running make deploy-infra on macOS. The production script deploy/openbao/bootstrap/init-unseal.sh uses openssl base64 -d for cross-platform compatibility; apply the same fix here.
- init_output=$(kubectl get secret "${SECRET_NAME}" \
- -n "${OPENBAO_NAMESPACE}" \
- -o jsonpath='{.data.init-output}' | base64 -d)
+ init_output=$(kubectl get secret "${SECRET_NAME}" \
+ -n "${OPENBAO_NAMESPACE}" \
+ -o jsonpath='{.data.init-output}' | openssl base64 -d)🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file hack/deploy-infra.sh around line 433:
`base64 -d` fails on macOS with "invalid option -- d" because BSD `base64` requires `-D` for decoding. This breaks the OpenBao unseal and bootstrap phases when running `make deploy-infra` on macOS. The production script `deploy/openbao/bootstrap/init-unseal.sh` uses `openssl base64 -d` for cross-platform compatibility; apply the same fix here.
| "verdict": "NEEDS_CHANGES" | ||
| }, | ||
| "summary": "Infrastructure deployment automation is well-implemented: kustomize overlays build correctly, shell scripts follow project conventions (SPDX, set -euo pipefail, log(), CC-0010 references), the Chainsaw E2E test covers all 12 required health assertions, and the two-phase kustomize pattern is correctly applied. The CI e2e-infra job follows SHA-pinned action conventions. One blocking issue: the CI job never installs the chainsaw binary, so the test step will fail with 'command not found'. Three additional major issues around secret handling and download integrity need addressing.", | ||
| "verdict": "NEEDS_CHANGES", |
There was a problem hiding this comment.
🟢 Low reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json:9
The verdict field on line 9 is set to "NEEDS_CHANGES" based on six claimed issues (ISSUE-01 through ISSUE-06), but these issues reference files that do not exist in the codebase being reviewed. The document asserts that .github/workflows/ci.yaml, hack/install-test-deps.sh, hack/deploy-infra.sh, and docs/reference/infrastructure/e2e-deployment.md contain specific bugs, yet none of these files appear in the provided review context. A verdict of "NEEDS_CHANGES" blocks the PR based on findings that cannot be verified against the actual code, which incorrectly flags the submission as defective. Consider updating the verdict to "APPROVED" or "NEEDS_VERIFICATION" if the referenced files are intended to be part of a different review scope, or ensure the review document aligns with the actual files under review.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file .planwerk/reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json around line 9:
The `verdict` field on line 9 is set to "NEEDS_CHANGES" based on six claimed issues (ISSUE-01 through ISSUE-06), but these issues reference files that do not exist in the codebase being reviewed. The document asserts that `.github/workflows/ci.yaml`, `hack/install-test-deps.sh`, `hack/deploy-infra.sh`, and `docs/reference/infrastructure/e2e-deployment.md` contain specific bugs, yet none of these files appear in the provided review context. A verdict of "NEEDS_CHANGES" blocks the PR based on findings that cannot be verified against the actual code, which incorrectly flags the submission as defective. Consider updating the verdict to "APPROVED" or "NEEDS_VERIFICATION" if the referenced files are intended to be part of a different review scope, or ensure the review document aligns with the actual files under review.
d5321f6 to
5fc9d3e
Compare
| "location": "hack/deploy-infra.sh:289-291", | ||
| "fix": "Remove the kubectl delete secret command on lines 289-291. The subsequent kubectl apply -f - will create-or-update the Secret correctly." | ||
| }, | ||
| { |
There was a problem hiding this comment.
🟢 Low reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json:116
The review document claims BAO_TOKEN remains exported after bootstrap completes and needs unset BAO_TOKEN added at line 376. However, hack/deploy-infra.sh:376 already contains unset BAO_TOKEN at the end of the openbao_bootstrap() function. This issue documents an already-implemented mitigation as missing — no changes needed.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file .planwerk/reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json around line 116:
The review document claims `BAO_TOKEN` remains exported after bootstrap completes and needs `unset BAO_TOKEN` added at line 376. However, `hack/deploy-infra.sh:376` already contains `unset BAO_TOKEN` at the end of the `openbao_bootstrap()` function. This issue documents an already-implemented mitigation as missing — no changes needed.
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Chainsaw v0.3+ E2E test configuration for CC-0002. | ||
| # Chainsaw v0.2.14 E2E test configuration for CC-0002. |
There was a problem hiding this comment.
🔴 Critical e2e/chainsaw-config.yaml:5
The Chainsaw version was downgraded from v0.3+ to v0.2.14 in the comment, but apiVersion: chainsaw.kyverno.io/v1alpha2 is not supported by Chainsaw v0.2.14 — it only supports v1alpha1. The removed DECISION comment explicitly states "v1alpha2 because Chainsaw v0.3+ only supports v1alpha2". When make e2e runs with CHAINSAW_VERSION="v0.2.14" (as set in hack/install-test-deps.sh), Chainsaw will fail to parse the v1alpha2 configuration, causing all E2E tests to fail.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file tests/e2e/chainsaw-config.yaml around line 5:
The Chainsaw version was downgraded from v0.3+ to v0.2.14 in the comment, but `apiVersion: chainsaw.kyverno.io/v1alpha2` is not supported by Chainsaw v0.2.14 — it only supports `v1alpha1`. The removed DECISION comment explicitly states "v1alpha2 because Chainsaw v0.3+ only supports v1alpha2". When `make e2e` runs with `CHAINSAW_VERSION="v0.2.14"` (as set in `hack/install-test-deps.sh`), Chainsaw will fail to parse the v1alpha2 configuration, causing all E2E tests to fail.
| "severity": "minor", | ||
| "check_ids": ["D2"], | ||
| "title": "jq filter uses bash string interpolation instead of --arg", | ||
| "description": "The wait_for_helmreleases function interpolates the release name into a jq filter string via bash escaping (line 75). Using jq --arg is more robust against special characters and is idiomatic jq usage.", |
There was a problem hiding this comment.
🟢 Low reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json:130
The jq filter in wait_for_helmreleases already uses --arg for safe parameter passing. The review finding at ISSUE-05/A-01 claims bash string interpolation is used instead, but line 77 shows --arg name "${release}" with the $name reference in the filter. No changes needed — the code is already correct.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file .planwerk/reviews/CC-0010-d003-implement-infrastructure-deployment-review-1.json around line 130:
The jq filter in `wait_for_helmreleases` already uses `--arg` for safe parameter passing. The review finding at ISSUE-05/A-01 claims bash string interpolation is used instead, but line 77 shows `--arg name "${release}"` with the `$name` reference in the filter. No changes needed — the code is already correct.
5fc9d3e to
3a4b769
Compare
Signed-off-by: Planwerk <planwerk@b42labs.com>
3a4b769 to
d6110cf
Compare
| init_output=$(kubectl get secret "${SECRET_NAME}" \ | ||
| -n "${NAMESPACE}" \ | ||
| -o jsonpath='{.data.init-output}' | openssl base64 -d) | ||
| -o jsonpath='{.data.init-output}' | base64 -d) |
There was a problem hiding this comment.
🟡 Medium bootstrap/init-unseal.sh:138
The change from openssl base64 -d to base64 -d at line 140 causes the script to fail on macOS. BSD base64 requires -D (uppercase) for decoding, so base64 -d exits with "illegal option -- d", the init_output variable becomes empty, and unseal_pod() fails when jq cannot parse the empty input. This breaks local development on macOS for anyone running make deploy-infra. Revert to openssl base64 -d for cross-platform compatibility.
- init_output=$(kubectl get secret "${SECRET_NAME}" \
- -n "${NAMESPACE}" \
- -o jsonpath='{.data.init-output}' | base64 -d)
+ init_output=$(kubectl get secret "${SECRET_NAME}" \
+ -n "${NAMESPACE}" \
+ -o jsonpath='{.data.init-output}' | openssl base64 -d)🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file deploy/openbao/bootstrap/init-unseal.sh around lines 138-140:
The change from `openssl base64 -d` to `base64 -d` at line 140 causes the script to fail on macOS. BSD `base64` requires `-D` (uppercase) for decoding, so `base64 -d` exits with "illegal option -- d", the `init_output` variable becomes empty, and `unseal_pod()` fails when `jq` cannot parse the empty input. This breaks local development on macOS for anyone running `make deploy-infra`. Revert to `openssl base64 -d` for cross-platform compatibility.
Size: 🏗️ large
Category: infrastructure
Priority: high
Feature ID: S010 (CC-0014)
Summary
Implement
make deploy-infraandmake teardown-infraMakefile targets that deploy the full infrastructure stack (cert-manager, OpenBao, ESO, MariaDB Operator, Memcached Operator, infrastructure CRs, ExternalSecrets) to a kind cluster using FluxCD with kustomize overlays for kind-specific resource sizing. Bootstrap OpenBao end-to-end (init-unseal, secret engines, auth, policies, bootstrap secrets) to validate the full secret chain through ESO. Create a Chainsaw E2E test attests/e2e/infrastructure/infra-stack-health/chainsaw-test.yamlvalidating all components reach healthy state. Extend.github/workflows/ci.yamlwith ane2e-infrajob running the full stack test on every PR and push to main.Scope
Included:
make deploy-infratarget replacing stub S008 — installs FluxCD in kind, applies existing manifests via kustomize overlays in dependency ordermake teardown-infratarget (new) — deletes the kind clustermake install-test-depstarget replacing stub S002 — installs chainsaw, flux CLI, kind, helm prerequisitesmake e2etarget replacing stub S002 — runs Chainsaw testshack/deploy-infra.sh— creates kind cluster, runsflux install, applies kustomize overlays in two phases (base → infrastructure), runs OpenBao bootstrap, waits for healthhack/teardown-infra.sh—kind delete clusterhack/kind-config.yaml(single control-plane node)deploy/kind/base/anddeploy/kind/infrastructure/— patches HelmReleases and CRs for reduced replicas andstandardstorage classtests/e2e/infrastructure/infra-stack-health/chainsaw-test.yaml— asserts readiness of all operators, CRs, ClusterIssuer, and ExternalSecret sync statuse2e-infrajob in.github/workflows/ci.yamlusinghelm/kind-action,fluxcd/flux2/action, SHA-pinned actions per existing CI conventionsExcluded:
deploy/flux-system/manifests — read-only reference, kind-specific differences handled via kustomize overlays indeploy/kind/ci.yamlwith new job, path-filtered optimization deferredVisualization
flowchart TD subgraph MakeTargets["Makefile Targets"] ITD["make install-test-deps"] DI["make deploy-infra"] TI["make teardown-infra"] E2E["make e2e"] end subgraph DeployScript["hack/deploy-infra.sh"] K["1. kind create cluster"] FI["2. flux install"] NS["3. kubectl apply -k deploy/kind/base"] WAIT1["4. Wait: HelmReleases Ready"] INF["5. kubectl apply -k deploy/kind/infrastructure"] WAIT2["6. Wait: OpenBao pods Ready"] BOOT["7. OpenBao bootstrap scripts"] WAIT3["8. Wait: ExternalSecrets Synced"] end subgraph FluxReconciliation["FluxCD Reconciles HelmReleases"] CM["cert-manager"] OB["OpenBao"] MO["MariaDB Operator"] ESO["ESO"] MCO["Memcached Operator"] end subgraph InfraCRs["Infrastructure CRs via Kustomize"] CI2["ClusterIssuer"] TLS["OpenBao TLS Certificate"] MDB["MariaDB CR 1 replica"] MC["Memcached CR 1 replica"] CSS["ClusterSecretStore"] ES["ExternalSecrets x3"] end subgraph KindOverlays["deploy/kind/ Kustomize Overlays"] OVB["base/ — patches HelmRelease replicas and storage"] OVI["infrastructure/ — patches CR replicas and storage class"] end DI --> K --> FI --> NS NS --> OVB OVB -->|"FluxCD"| FluxReconciliation CM --> OB & MO & ESO & MCO FluxReconciliation --> WAIT1 WAIT1 --> INF INF --> OVI OVI --> InfraCRs InfraCRs --> WAIT2 --> BOOT --> WAIT3sequenceDiagram participant CI as GitHub Actions participant Kind as kind cluster participant Flux as FluxCD participant K8s as Kubernetes API participant OB as OpenBao participant CS as Chainsaw CI->>Kind: kind create cluster CI->>Flux: flux install CI->>K8s: kubectl apply -k deploy/kind/base Flux->>K8s: Reconcile cert-manager HelmRelease Flux->>K8s: Reconcile openbao, mariadb-op, eso, memcached-op CI->>K8s: Wait HelmReleases Ready CI->>K8s: kubectl apply -k deploy/kind/infrastructure K8s-->>K8s: cert-manager issues OpenBao TLS cert CI->>K8s: Wait OpenBao pods Ready CI->>OB: init-unseal.sh CI->>OB: setup-secret-engines, auth, policies, secrets CI->>K8s: Wait ExternalSecrets Synced CI->>CS: chainsaw test infra-stack-health CS->>K8s: Assert all components healthy CS-->>CI: JUnit XML reportKey Components
Makefiletargets: Replace stubsdeploy-infra(S008),install-test-deps(S002),e2e(S002); add newteardown-infra; each delegates to scripts inhack/hack/deploy-infra.sh: Orchestration script — creates kind cluster, runsflux install, appliesdeploy/kind/base/kustomization (FluxCD reconciles HelmReleases), appliesdeploy/kind/infrastructure/kustomization (CRs + ESO resources), runs OpenBao bootstrap scripts fromdeploy/openbao/bootstrap/, includeswait_for_ready()helpers with configurable timeouts; follows existing shell conventions (set -euo pipefail, SPDX header)hack/teardown-infra.sh: Cleanup script —kind delete cluster --name <cluster-name>hack/kind-config.yaml: Single control-plane node, sufficient for CI runners (~7GB RAM, 2 vCPUs)deploy/kind/base/kustomization.yaml: Kustomize overlay referencing../../flux-system/, patches OpenBao HelmRelease to standalone mode (1 replica, HA disabled); other operators unchanged (single-replica by default or stateless)deploy/kind/infrastructure/kustomization.yaml: Kustomize overlay referencing../../flux-system/infrastructure/, patches MariaDB CR (1 replica, Galera disabled, MaxScale disabled,standardstorage class), patches Memcached CR (1 replica)tests/e2e/infrastructure/infra-stack-health/chainsaw-test.yaml: Chainsaw v1alpha2 test asserting: cert-manager Deployment Ready, OpenBao StatefulSet Ready, ESO Deployment Ready, MariaDB Operator Deployment Ready, Memcached Operator Deployment Ready, ClusterIssuer Ready condition, MariaDB CR Ready condition, Memcached CR Ready condition, ClusterSecretStore Valid condition, ExternalSecrets SecretSynced condition; uses extended assert timeout (~5min) for operator startup.github/workflows/ci.yaml—e2e-infrajob: New job alongside existinglintandtest; uses SHA-pinnedhelm/kind-actionto create cluster, installs FluxCD, runsmake deploy-infra, executeschainsaw test --config tests/e2e/chainsaw-config.yaml tests/e2e/infrastructure/, uploads JUnit report as artifact;timeout-minutes: 20; follows existing CI conventions (SHA-pinned actions,permissions: contents: read,concurrencywith cancel-in-progress on PRs)Note
Add end-to-end infrastructure deployment automation for a kind cluster
e2e-infraandshellcheckCI jobs in .github/workflows/ci.yaml that run the deployment script and Chainsaw tests, uploading JUnit reports and diagnostics on failure.v1beta1tov1, and corrects the Memcached CRD API group fromcache.c5c3.iotomemcached.c5c3.ioacross all manifests, simulators, and fake CRDs.mariadb-operatornow depends on a newmariadb-operator-crdsHelmRelease; secret generation in OpenBao bootstrap usesbao write sys/tools/randominstead ofopenssl rand.Macroscope summarized d6110cf.