feat: add local telemetry playground and monitoring docs by udsmicrosoft · Pull Request #286 · documentdb/documentdb-kubernetes-operator

udsmicrosoft · 2026-03-05T18:12:09Z

Summary

Adds a complete, self-contained local telemetry playground for Kind clusters, monitoring documentation, and sidecar injector OTel support.

Local Telemetry Playground (`documentdb-playground/telemetry/local/`)

One-command deployment — ./scripts/deploy.sh handles everything: Kind cluster, cert-manager, CNPG, DocumentDB operator (from GHCR), observability stack, DocumentDB HA cluster, and traffic generators.

3-instance DocumentDB HA cluster (1 primary + 2 streaming replicas) on Kind
Full observability stack: OTel Collector (central + per-node DaemonSet), Prometheus, Tempo, Loki, Grafana
Gateway metrics, traces, and logs via OTLP push
PostgreSQL metrics via CNPG built-in exporter and OTel postgresql receiver
System resource metrics via kubeletstats receiver (DaemonSet for full-cluster coverage)
Pre-built Grafana dashboards:
- Gateway — ops/sec, average latency, error rates, active connections, pool utilization, document throughput by collection, request/response sizes, gateway logs (Loki)
- Internals — PG backends, replication lag, commits/rollbacks, row operations/sec, WAL size, database size, container CPU/memory/network/filesystem
Traffic generator Jobs (RW to primary, RO to replicas) with error injection (~10% of iterations)
Sample Prometheus alerting rules (GatewayHighErrorRate, PostgresReplicationLagHigh, PostgresConnectionSaturation, GatewayDown)
Tempo→Loki and Tempo→Prometheus datasource correlations in Grafana
validate.sh health check script verifying pods, targets, and data flow
Support for custom gateway images via local Kind registry (localhost:5001)

Sidecar Injector Changes

Add OTEL_TRACING_ENABLED env var for trace collection
Add POD_NAME env var (from K8s downward API) and OTEL_RESOURCE_ATTRIBUTES with service.instance.id=$(POD_NAME) for per-instance metric attribution in multi-replica deployments
Add imagePullPolicy to sidecar injector deployment

Monitoring Documentation (`docs/.../monitoring/`)

Architecture overview — OTLP push flow from gateway, collector deployment modes, ServiceMonitor examples, CNPG metrics availability caveat
Metrics reference — gateway metrics (db_client_operations_total, gateway_client_connections_active, pool/document metrics), container resources, controller-runtime (with correct controller names), CNPG/PostgreSQL metrics, OTel naming conventions
Updated mkdocs.yml navigation

Review — PR #286: Local Telemetry Playground + Monitoring Docs

This PR has two parts: (1) a code change to the sidecar injector adding OTEL resource attributes, and (2) documentation + playground files for monitoring. Reviewing both.

Code Change: `lifecycle.go` — OTEL Resource Attributes

The code adds POD_NAME (via downward API) and OTEL_RESOURCE_ATTRIBUTES=service.instance.id=$(POD_NAME) to every gateway sidecar container.

Check	Result
Env var ordering (POD_NAME before OTEL_RESOURCE_ATTRIBUTES)	✅ Correct — K8s resolves `$(VAR)` in declaration order
Downward API `metadata.name` field path	✅ Valid
OTEL semantic convention `service.instance.id`	✅ Matches OTel spec
Insertion point (after OTEL_EXPORTER_OTLP_ENDPOINT, before credentials)	✅ Clean
Existing tests cover this code path	⚠️ No `lifecycle_test.go` found — see Major #3

The code change is correct and well-placed.

🔴 Critical (1)

1. PostgreSQL receiver will fail to authenticate — password is `"unused"`

The OTel collector config contains:

postgresql:
  endpoint: documentdb-preview-rw.documentdb-preview-ns.svc.cluster.local:5432
  username: postgres
  password: "unused"

CNPG generates and manages the postgres superuser password in a Kubernetes Secret (typically <cluster-name>-superuser). The hardcoded "unused" password will cause the postgresql receiver to fail SQL authentication, meaning all PostgreSQL-level metrics (replication lag, backends, DB size, commits, rollbacks) documented in metrics.md will not actually be collected.

Fix options:

Mount the CNPG superuser secret into the OTel collector pod and reference it via ${env:PG_MONITOR_PASSWORD}
Create a dedicated monitoring role with limited permissions and a known password
Add a setup step to the README that creates the monitoring credentials

This also applies to PG_MONITOR_USER: postgres / PG_MONITOR_PASSWORD: unused in the collector deployment env vars.

🟠 Major (3)

2. Monitoring docs missing front matter (title, description, tags)

Both overview.md and metrics.md lack YAML front matter. Per the documentation standards, every page should include:

---
title: Monitoring Overview
description: How to monitor DocumentDB clusters using OpenTelemetry, Prometheus, and Grafana.
tags:
  - monitoring
  - observability
  - metrics
---

This affects search ranking, AI discoverability, and consistency with other documentation pages.

3. No tests for the `lifecycle.go` code change

The sidecar injector's lifecycle.go has no test file. The new POD_NAME and OTEL_RESOURCE_ATTRIBUTES env vars are injected into every DocumentDB gateway container — a functional change that affects all deployments. While the change is straightforward, a unit test would prevent regression (e.g., if someone reorders the env vars, breaking $(POD_NAME) resolution).

Suggestion: Add a test in operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle_test.go that verifies:

POD_NAME env var is present with FieldRef to metadata.name
OTEL_RESOURCE_ATTRIBUTES is present and contains service.instance.id=$(POD_NAME)
POD_NAME appears before OTEL_RESOURCE_ATTRIBUTES in the env slice

4. Traffic generator and observability stack contain hardcoded passwords

Multiple files use plaintext passwords:

documentdb-ha.yaml: password: DemoPassword100
traffic-generator.yaml: -p DemoPassword100
observability-stack.yaml: GF_SECURITY_ADMIN_PASSWORD: admin, GF_AUTH_ANONYMOUS_ORG_ROLE: Admin

For a playground/demo this is acceptable, but:

The Grafana anonymous admin access (GF_AUTH_ANONYMOUS_ENABLED: true + GF_AUTH_ANONYMOUS_ORG_ROLE: Admin) should have a comment warning this is for demo use only
Consider adding a ⚠️ DO NOT USE IN PRODUCTION header comment in observability-stack.yaml

🟡 Minor (5)

5. Gateway metric names cannot be verified against source code

The docs reference metrics like db_client_operations_total, gateway_client_connections_active, db_client_connection_active, etc. These are emitted by the gateway binary (separate repo), not the operator Go code. I cannot verify these metric names against this repository.

Recommendation: Add a note in metrics.md indicating these metric names are from the DocumentDB Gateway and may change between versions.

6. Collector deployed as Deployment but kubeletstats only covers one node

The kubeletstats receiver with K8S_NODE_NAME env var only scrapes the node where the collector pod runs. In the 4-node Kind cluster (1 control-plane + 3 workers), you'll miss kubelet metrics from 3 of 4 nodes.

Recommendation: Add a note in the observability-stack.yaml or overview.md that kubeletstats coverage is limited in Deployment mode (vs DaemonSet).

7. Kind config uses containerd v1 registry path

The kind-config.yaml uses [plugins."io.containerd.grpc.v1.cri".registry]. With kindest/node:v1.35.0 and containerd 2.x, this path may have changed to io.containerd.cri.v1.images. Verify against the target Kind version.

8. `K8S_VERSION` defaults to `v1.35.0` in setup-kind.sh

This pins to a very specific Kind image. If the image isn't available, the script fails. Consider defaulting to a known-good version or adding a version check.

9. The `documentdb-preview-rw` endpoint is hardcoded in collector config

The OTel collector config hardcodes documentdb-preview-rw.documentdb-preview-ns.svc.cluster.local:5432. This only works for a cluster named documentdb-preview in namespace documentdb-preview-ns. Document this coupling or make it configurable via env vars.

✅ What's Good

Clean code change: The POD_NAME / OTEL_RESOURCE_ATTRIBUTES injection is well-implemented with correct K8s env var ordering
Complete observability stack: Tempo (traces) + Loki (logs) + Prometheus (metrics) + Grafana (dashboards) + OTel Collector — full three-pillar observability
ExternalName service bridge: Clever cross-namespace OTLP routing from DocumentDB namespace to observability namespace
Comprehensive metrics reference: metrics.md covers container, gateway, operator, and PostgreSQL metrics with useful PromQL examples
Architecture diagram: Clear ASCII art showing data flow from pods → collector → backends → Grafana
OTel naming table: OpenTelemetry ↔ Prometheus metric name mapping table is very helpful
Traffic generator: Well-designed mongosh script with varied CRUD operations, periodic cleanup, and error handling
mkdocs.yml change is clean: Only adds nav entries, no extension conflicts with other PRs
Grafana dashboard JSON: Pre-built 1528-line dashboard — users get instant value
RBAC for collector: Proper ClusterRole with nodes/stats, pods, services, metrics access

Summary

Severity	Count	Items
🔴 Critical	1	PostgreSQL receiver auth will fail (`password: "unused"`)
🟠 Major	3	Missing front matter, no lifecycle tests, hardcoded passwords need warnings
🟡 Minor	5	Unverifiable gateway metrics, kubeletstats single-node, containerd v2, K8s version pin, hardcoded endpoint
✅	—	Code change correct, complete stack, clean mkdocs, good architecture

Verdict: The code change to lifecycle.go is correct and adds genuine value (per-instance metric attribution). The observability stack is comprehensive and well-designed. Fix the PostgreSQL receiver authentication (Critical #1) — without it, half the documented metrics won't actually be collected. Add front matter to the docs and ideally a unit test for the env var injection.

docs/operator-public-documentation/preview/monitoring/overview.md

- Restructured telemetry playground into organized k8s/, dashboards/, scripts/ layout - Split monolithic observability-stack.yaml into per-component files - Added two Grafana dashboards: Gateway (ops, latency, errors, connections) and Internals (PostgreSQL, indexes, infrastructure) - Added deploy.sh and teardown.sh scripts for one-command setup/teardown - Added Grafana dashboard provisioning via ConfigMap - Added traffic generators with read/write split (primary vs replicas) - Added OTEL_TRACING_ENABLED=true to sidecar injector for trace collection - Added monitoring docs (overview.md, metrics.md) with architecture diagrams - Added README with Mermaid architecture diagram Signed-off-by: urismiley <urismiley@microsoft.com>

- Fix Critical documentdb#1: OTel collector postgresql receiver now uses CNPG superuser secret (copied cross-namespace by deploy.sh) instead of hardcoded 'unused' password - Fix Major documentdb#2: Add YAML front matter (title, description, tags) to monitoring overview.md and metrics.md - Fix Major documentdb#4: Add DO NOT USE IN PRODUCTION warnings to grafana.yaml, cluster.yaml, traffic-generator.yaml, and otel-collector.yaml - Fix Minor documentdb#5: Add note that gateway metric names are versioned independently and may change between releases - Fix Minor documentdb#6: Document kubeletstats single-node coverage limitation in otel-collector.yaml - Fix Minor documentdb#9: Document hardcoded PG endpoint coupling to CR name and namespace in otel-collector.yaml Signed-off-by: urismiley <urismiley@microsoft.com>

Convert the monitoring architecture diagram from ASCII art to a Mermaid graph for better rendering in mkdocs. Enable the pymdownx.superfences mermaid custom fence in mkdocs.yml. Signed-off-by: urismiley <urismiley@microsoft.com>

Copilot

Pull request overview

Adds a local “telemetry playground” for running a DocumentDB HA demo on Kind with a full observability stack, and introduces monitoring documentation pages (with MkDocs nav updates) describing the telemetry/metrics architecture and references. It also updates the CNPG sidecar injector to set per-pod OpenTelemetry resource attributes for better per-instance attribution.

Changes:

Add documentdb-playground/telemetry/local/ demo (Kind setup/deploy/teardown scripts, k8s manifests, traffic generator, Grafana dashboards).
Add Monitoring docs (overview.md, metrics.md) and update mkdocs.yml nav + Mermaid fenced blocks support.
Update sidecar injector to inject POD_NAME and OTEL_RESOURCE_ATTRIBUTES (and enable tracing via env var).

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go	Injects additional OTEL-related env vars into the gateway sidecar for per-instance attribution.
mkdocs.yml	Adds Monitoring section to nav and enables Mermaid code fences via `pymdownx.superfences`.
documentdb-playground/telemetry/local/scripts/teardown.sh	Adds Kind cluster teardown + proxy container cleanup.
documentdb-playground/telemetry/local/scripts/setup-kind.sh	Adds Kind cluster + local registry bootstrap script.
documentdb-playground/telemetry/local/scripts/deploy.sh	Adds one-command deploy flow for observability stack + DocumentDB + traffic.
documentdb-playground/telemetry/local/k8s/traffic/traffic-generator.yaml	Adds RW/RO traffic generator Jobs and gateway Services for demo load.
documentdb-playground/telemetry/local/k8s/observability/tempo.yaml	Adds Tempo deployment/service for traces backend.
documentdb-playground/telemetry/local/k8s/observability/prometheus.yaml	Adds Prometheus deployment/service and scrape config for the collector.
documentdb-playground/telemetry/local/k8s/observability/otel-collector.yaml	Adds OTel Collector deployment, RBAC, and pipelines for metrics/traces/logs.
documentdb-playground/telemetry/local/k8s/observability/namespace.yaml	Adds `observability` namespace manifest.
documentdb-playground/telemetry/local/k8s/observability/loki.yaml	Adds Loki deployment/service for logs backend.
documentdb-playground/telemetry/local/k8s/observability/grafana.yaml	Adds Grafana deployment/service plus datasource & dashboard provisioning.
documentdb-playground/telemetry/local/k8s/documentdb/collector-bridge.yaml	Adds ExternalName bridge Service so gateway OTLP endpoint resolves across namespaces.
documentdb-playground/telemetry/local/k8s/documentdb/cluster.yaml	Adds demo DocumentDB namespace, credentials, and DocumentDB CR for 3-instance HA.
documentdb-playground/telemetry/local/dashboards/internals.json	Adds “Internals” Grafana dashboard JSON.
documentdb-playground/telemetry/local/dashboards/gateway.json	Adds “Gateway” Grafana dashboard JSON.
documentdb-playground/telemetry/local/README.md	Documents local playground usage and architecture (diagram + steps).
docs/operator-public-documentation/preview/monitoring/overview.md	Adds monitoring architecture + setup/verification guidance.
docs/operator-public-documentation/preview/monitoring/metrics.md	Adds metrics reference and PromQL examples across signal sources.

You can also share your feedback on Copilot code review. Take the survey.

documentdb-playground/telemetry/local/README.md

documentdb-playground/telemetry/local/scripts/deploy.sh

documentdb-playground/telemetry/local/k8s/traffic/traffic-generator.yaml

docs/operator-public-documentation/preview/monitoring/overview.md

Document that the gateway image must include OpenTelemetry instrumentation from documentdb/documentdb#443 for the telemetry playground and monitoring features to work. Signed-off-by: urismiley <urismiley@microsoft.com>

- Fix Mermaid diagram: 'remote write' → 'scrape :8889' (Prometheus scrapes the collector, not the other way around) - Derive CONTEXT from CLUSTER_NAME in deploy.sh for consistency with setup-kind.sh - Add 3-minute timeout to CNPG superuser secret wait loop - Pin mongodb-community-server image to 7.0.30-ubuntu2204 - Fix '3-node' → '3-instance' in overview.md (1 node, 3 instances) Signed-off-by: urismiley <urismiley@microsoft.com>

Signed-off-by: urismiley <urismiley@microsoft.com>

…el, validation script, kubeletstats DaemonSet Signed-off-by: urismiley <urismiley@microsoft.com>

…fy OTel naming, trim verbose examples Signed-off-by: urismiley <urismiley@microsoft.com>

…quisite instructions Signed-off-by: urismiley <urismiley@microsoft.com>

… to cluster CR Signed-off-by: urismiley <urismiley@microsoft.com>

…otal Signed-off-by: urismiley <urismiley@microsoft.com>

…ted gateway metrics Signed-off-by: urismiley <urismiley@microsoft.com>

…c/sec by collection Signed-off-by: urismiley <urismiley@microsoft.com>

…l metrics-expansion branch Signed-off-by: urismiley <urismiley@microsoft.com>

… published images, add custom image docs Signed-off-by: urismiley <urismiley@microsoft.com>

Signed-off-by: urismiley <urismiley@microsoft.com>

…alidate.sh, fix doc metric name Signed-off-by: urismiley <urismiley@microsoft.com>

alaye-ms · 2026-03-27T19:05:33Z

documentdb-playground/telemetry/local/README.md

+- **Helm 3** — [install](https://helm.sh/docs/intro/install/)
+- **jq** — for credential copying
+
+!!! important "Gateway OTEL support required"


These docs don't use the mkbuild stuff, so the !!! important tag doesn't work here

alaye-ms · 2026-03-27T19:07:17Z

documentdb-playground/telemetry/local/README.md

+./scripts/validate.sh
+
+# Tear down
+./scripts/teardown.sh


Maybe this could have a separate heading for cleanup instead of being in quickstart

I see you do have a section for it, so I think it can be removed here.

WentingWu666666

docs/operator-public-documentation/preview/monitoring/overview.md (lines 23-30)

Instead of asking users to build their own gateway image from a specific branch, it would be better to publish a pre-built gateway image (e.g., to ghcr.io/documentdb/...) that includes the OTEL instrumentation. Requiring users to build the image themselves adds friction and is error-prone most users won't have the build toolchain set up for the gateway. Could we provide a ready-to-use image with telemetry support, or at least track an issue to do so before GA?

xgerman reviewed Mar 11, 2026

View reviewed changes

docs/operator-public-documentation/preview/monitoring/overview.md Outdated Show resolved Hide resolved

udsmicrosoft force-pushed the users/urismiley/telemetry-playground branch from ae99e4f to 04ef816 Compare March 16, 2026 16:20

udsmicrosoft added 2 commits March 16, 2026 12:32

udsmicrosoft marked this pull request as ready for review March 16, 2026 16:40

udsmicrosoft requested review from alaye-ms and hossain-rayhan as code owners March 16, 2026 16:40

Copilot AI review requested due to automatic review settings March 16, 2026 16:40

Copilot started reviewing on behalf of udsmicrosoft March 16, 2026 16:42 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

udsmicrosoft added 6 commits March 16, 2026 12:54

Add prerequisite note for gateway OTEL support (documentdb#443)

37a5951

Document that the gateway image must include OpenTelemetry instrumentation from documentdb/documentdb#443 for the telemetry playground and monitoring features to work. Signed-off-by: urismiley <urismiley@microsoft.com>

resolve merge conflict in mkdocs.yml

ea6352f

Signed-off-by: urismiley <urismiley@microsoft.com>

resolve merge conflict in mkdocs.yml

db29135

Signed-off-by: urismiley <urismiley@microsoft.com>

Improve telemetry playground: OTel best practices, alerting, logs pan…

ffb48c7

…el, validation script, kubeletstats DaemonSet Signed-off-by: urismiley <urismiley@microsoft.com>

Fix monitoring docs: correct controller names, add CNPG caveat, clari…

dae61fe

…fy OTel naming, trim verbose examples Signed-off-by: urismiley <urismiley@microsoft.com>

udsmicrosoft force-pushed the users/urismiley/telemetry-playground branch 3 times, most recently from b3cdc5f to dae61fe Compare March 25, 2026 06:36

udsmicrosoft added 4 commits March 25, 2026 03:03

Fix namespace race in deploy.sh and update README with operator prere…

8d97551

…quisite instructions Signed-off-by: urismiley <urismiley@microsoft.com>

Fix deploy.sh secret detection for CNPG 1.28 and add exposeViaService…

89745bf

… to cluster CR Signed-off-by: urismiley <urismiley@microsoft.com>

Fix gateway dashboard instance variable to use db_client_operations_t…

6355d50

…otal Signed-off-by: urismiley <urismiley@microsoft.com>

Fix dashboards: remap to cnpg_* metrics, remove panels for unimplemen…

dfeb8fb

…ted gateway metrics Signed-off-by: urismiley <urismiley@microsoft.com>

udsmicrosoft force-pushed the users/urismiley/telemetry-playground branch from 9b0e6b5 to dfeb8fb Compare March 25, 2026 18:15

udsmicrosoft added 2 commits March 25, 2026 14:24

Fix dashboards: remap PG ops to cnpg tup_ metrics, WAL size, split do…

153a76e

…c/sec by collection Signed-off-by: urismiley <urismiley@microsoft.com>

docs: clarify gateway OTel prerequisite — base instrumentation vs ful…

39680c3

…l metrics-expansion branch Signed-off-by: urismiley <urismiley@microsoft.com>

udsmicrosoft requested a review from WentingWu666666 as a code owner March 26, 2026 16:18

Make deploy.sh self-contained: install operator from GHCR, default to…

f28dd95

… published images, add custom image docs Signed-off-by: urismiley <urismiley@microsoft.com>

udsmicrosoft added 2 commits March 26, 2026 12:28

docs: remove stale operator prerequisite, fix latency description

eac15b2

Signed-off-by: urismiley <urismiley@microsoft.com>

Fix alert rules to use cnpg_* metrics, replace python3 with grep in v…

ff27b7a

…alidate.sh, fix doc metric name Signed-off-by: urismiley <urismiley@microsoft.com>

alaye-ms approved these changes Mar 27, 2026

View reviewed changes

WentingWu666666 reviewed Mar 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add local telemetry playground and monitoring docs#286

feat: add local telemetry playground and monitoring docs#286
udsmicrosoft wants to merge 18 commits intodocumentdb:mainfrom
udsmicrosoft:users/urismiley/telemetry-playground

udsmicrosoft commented Mar 5, 2026 •

edited

Loading

Uh oh!

xgerman commented Mar 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alaye-ms Mar 27, 2026

Uh oh!

alaye-ms Mar 27, 2026

Uh oh!

alaye-ms Mar 27, 2026

Uh oh!

WentingWu666666 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

udsmicrosoft commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Local Telemetry Playground (documentdb-playground/telemetry/local/)

Sidecar Injector Changes

Monitoring Documentation (docs/.../monitoring/)

Related

Uh oh!

xgerman commented Mar 11, 2026

Review — PR #286: Local Telemetry Playground + Monitoring Docs

Code Change: lifecycle.go — OTEL Resource Attributes

🔴 Critical (1)

1. PostgreSQL receiver will fail to authenticate — password is "unused"

🟠 Major (3)

2. Monitoring docs missing front matter (title, description, tags)

3. No tests for the lifecycle.go code change

4. Traffic generator and observability stack contain hardcoded passwords

🟡 Minor (5)

5. Gateway metric names cannot be verified against source code

6. Collector deployed as Deployment but kubeletstats only covers one node

7. Kind config uses containerd v1 registry path

8. K8S_VERSION defaults to v1.35.0 in setup-kind.sh

9. The documentdb-preview-rw endpoint is hardcoded in collector config

✅ What's Good

Summary

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alaye-ms Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

alaye-ms Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

alaye-ms Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

WentingWu666666 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

udsmicrosoft commented Mar 5, 2026 •

edited

Loading

Local Telemetry Playground (`documentdb-playground/telemetry/local/`)

Monitoring Documentation (`docs/.../monitoring/`)

Code Change: `lifecycle.go` — OTEL Resource Attributes

1. PostgreSQL receiver will fail to authenticate — password is `"unused"`

3. No tests for the `lifecycle.go` code change

8. `K8S_VERSION` defaults to `v1.35.0` in setup-kind.sh

9. The `documentdb-preview-rw` endpoint is hardcoded in collector config