From 525700b8fe89553ee96422fb93fca848208d2a7e Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Thu, 26 Mar 2026 12:34:26 -0400 Subject: [PATCH 01/13] feat: add two-phase extension upgrade with spec.schemaVersion Add spec.schemaVersion field to DocumentDBSpec to decouple binary (image) upgrades from schema (ALTER EXTENSION) upgrades. This provides a rollback-safe window between deploying a new binary and committing the schema change. Three modes: - Empty (default): Two-phase mode. Schema stays at current version until user explicitly sets schemaVersion. Safe by default for production. - "auto": Auto-finalize. Schema updates to match binary version automatically. Simple mode for development and testing. - Explicit version: Schema updates to exactly that version. Must be <= binary. Changes: - api/preview/documentdb_types.go: Add SchemaVersion to DocumentDBSpec - internal/controller/documentdb_controller.go: Add determineSchemaTarget() function, modify upgradeDocumentDBIfNeeded() to gate ALTER EXTENSION on spec.schemaVersion value - internal/utils/util.go: Add SemverToExtensionVersion() inverse conversion - Regenerated CRDs (config/crd + helm chart) - Added unit tests for all three modes and edge cases - Created public upgrade documentation (docs/operator-public-documentation/) - Added Upgrading DocumentDB page to mkdocs.yml navigation Closes #271 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu --- .../preview/operations/upgrades.md | 100 +++- .../crds/documentdb.io_dbs.yaml | 19 + operator/src/api/preview/documentdb_types.go | 25 + .../config/crd/bases/documentdb.io_dbs.yaml | 19 + .../controller/documentdb_controller.go | 146 +++++- .../controller/documentdb_controller_test.go | 435 ++++++++++++++++++ operator/src/internal/utils/util.go | 10 + operator/src/internal/utils/util_test.go | 25 + 8 files changed, 750 insertions(+), 29 deletions(-) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index f279c969..e8a8e4b9 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -112,6 +112,22 @@ helm rollback documentdb-operator -n documentdb-operator Updating `spec.documentDBVersion` upgrades **both** the DocumentDB extension and the gateway together, since they share the same version. +### Schema Version Control + +**The operator never changes your database schema unless you ask.** + +- Set `documentDBVersion` → updates the binary (safe to roll back) +- Set `schemaVersion` → updates the database schema (irreversible) +- Set `schemaVersion: "auto"` → schema auto-updates with binary + +Think of it as: **`documentDBVersion` installs the software, `schemaVersion` applies the database migration.** + +| `spec.schemaVersion` | Behavior | Use case | +|----------------------|----------|----------| +| Not set (default) | Binary upgrades. Schema stays at current version until you set `schemaVersion`. | Production — gives you a rollback-safe window before committing the schema change. | +| `"auto"` | Schema updates automatically with binary. No rollback window. | Development and testing — simple, one-step upgrades. | +| Explicit version (e.g. `"0.112.0"`) | Schema updates to exactly that version. | Controlled rollouts — you choose when and what version to finalize. | + ### Pre-Upgrade Checklist 1. **Check the CHANGELOG** — review release notes for breaking changes. @@ -121,21 +137,58 @@ Updating `spec.documentDBVersion` upgrades **both** the DocumentDB extension and ### Step 1: Update the DocumentDB Version !!! danger - **Downgrades are not supported.** If you set `documentDBVersion` to a version lower than the currently installed schema version, the operator will still update the container images but will **skip the schema migration** (`ALTER EXTENSION UPDATE`) and emit a warning event. This leaves the DocumentDB cluster running an older binary against a newer schema, which may cause unexpected behavior. Always move forward to a newer version or restore from backup. - -```yaml title="documentdb.yaml" -apiVersion: documentdb.io/preview -kind: DocumentDB -metadata: - name: my-cluster - namespace: default -spec: - documentDBVersion: "" -``` + **Downgrades are not supported.** Once the schema has been updated (`status.schemaVersion` shows the new version), the operator **blocks** any attempt to set `documentDBVersion` to a version lower than the installed schema version. If you need to recover after a schema update, restore from backup. See [Rollback and Recovery](#rollback-and-recovery). -```bash -kubectl apply -f documentdb.yaml -``` +=== "Default (controlled upgrade)" + + Update only the binary version. The schema stays at the current version until you explicitly finalize: + + ```yaml title="documentdb.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-cluster + namespace: default + spec: + documentDBVersion: "" + # schemaVersion is not set — schema stays, safe to roll back + ``` + + ```bash + kubectl apply -f documentdb.yaml + ``` + + After the binary upgrade completes and you've validated the cluster, finalize the schema: + + ```bash + kubectl patch documentdb my-cluster -n default \ + --type merge -p '{"spec":{"schemaVersion":""}}' + ``` + + !!! tip + On subsequent upgrades, just update `documentDBVersion` again. The schema stays pinned at the previous `schemaVersion` value until you update it. + +=== "Auto mode" + + Update both the binary and schema in one step: + + ```yaml title="documentdb.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-cluster + namespace: default + spec: + documentDBVersion: "" + schemaVersion: "auto" + ``` + + ```bash + kubectl apply -f documentdb.yaml + ``` + + !!! warning + With `schemaVersion: "auto"`, the schema migration is irreversible once applied. You cannot roll back to the previous version — only restore from backup. ### Step 2: Monitor the Upgrade @@ -154,9 +207,9 @@ kubectl get documentdb my-cluster -n default -o jsonpath='{.status.schemaVersion Whether you can roll back depends on whether the schema has been updated: -=== "Schema not yet updated" +=== "Schema not yet updated (rollback safe)" - If `status.schemaVersion` still shows the **previous** version, the extension schema migration has not run yet. You can roll back by reverting `spec.documentDBVersion` to the previous value: + If `status.schemaVersion` still shows the **previous** version, the extension schema migration has not run yet. You can safely roll back by reverting `spec.documentDBVersion` to the previous value: ```bash # Check the current schema version @@ -169,20 +222,25 @@ Whether you can roll back depends on whether the schema has been updated: kubectl apply -f documentdb.yaml ``` -=== "Schema already updated" +=== "Schema already updated (rollback blocked)" + + If `status.schemaVersion` shows the **new** version, the schema migration has already been applied and **cannot be reversed**. The operator will **block** any attempt to roll back the binary image below the installed schema version, because running an older binary against a newer schema is untested and may cause data corruption. - If `status.schemaVersion` shows the **new** version, the schema migration has already been applied and **cannot be reversed**. To recover: Use the backup you created in the [Pre-Upgrade Checklist](#pre-upgrade-checklist) to restore the DocumentDB cluster to its pre-upgrade state. See [Backup and Restore](backup-and-restore.md) for instructions. + To recover: Use the backup you created in the [Pre-Upgrade Checklist](#pre-upgrade-checklist) to restore the DocumentDB cluster to its pre-upgrade state. See [Backup and Restore](backup-and-restore.md) for instructions. !!! tip - This is why backing up before a component upgrade is critical. Once the schema is updated, there is no rollback — only restore. + This is why the default two-phase mode exists — it gives you a rollback-safe window before committing the schema change. Always back up before upgrading, and validate the new binary before setting `schemaVersion`. ### How It Works 1. You update the `spec.documentDBVersion` field. 2. The operator detects the version change and updates both the database image and the gateway sidecar image. 3. The underlying cluster manager performs a **rolling restart**: replicas are restarted first one at a time, then the **primary is restarted in place**. Expect a brief period of downtime while the primary pod restarts. -4. After the primary pod is healthy, the operator runs `ALTER EXTENSION documentdb UPDATE` to update the database schema. -5. The operator tracks the schema version in `status.schemaVersion`. +4. After the primary pod is healthy, the operator checks `spec.schemaVersion`: + - **Not set (default)**: The operator **skips** the schema migration and emits a `SchemaUpdateAvailable` event. The binary is updated but the database schema is unchanged — you can safely roll back by reverting `documentDBVersion`. + - **`"auto"`**: The operator runs `ALTER EXTENSION documentdb UPDATE` to update the database schema to match the binary. This is irreversible. + - **Explicit version**: The operator runs `ALTER EXTENSION documentdb UPDATE TO ''` to update to that specific version. This is irreversible. +5. The operator tracks the current schema version in `status.schemaVersion`. ### Advanced: Independent Image Overrides diff --git a/operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml b/operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml index ee2827f5..36191073 100644 --- a/operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml +++ b/operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml @@ -1264,6 +1264,25 @@ spec: required: - storage type: object + schemaVersion: + description: |- + SchemaVersion controls the desired schema version for the DocumentDB extension. + + This field decouples the extension binary (image) update from the schema update + (ALTER EXTENSION documentdb UPDATE), providing a rollback-safe upgrade window. + + Values: + - "" (empty, default): Two-phase mode. Image upgrades happen automatically, + but ALTER EXTENSION UPDATE does NOT run. Users must explicitly set this + field to finalize the schema upgrade. This is the safest option for production + as it allows rollback by reverting the image before committing the schema change. + - "auto": Schema automatically updates to match the binary version whenever + the binary is upgraded. This is the simplest mode but provides no rollback + safety window. Recommended for development and testing environments. + - "" (e.g. "0.112.0"): Schema updates to exactly this version. + Must be <= the binary version. + pattern: ^(auto|[0-9]+\.[0-9]+\.[0-9]+)?$ + type: string sidecarInjectorPluginName: description: SidecarInjectorPluginName is the name of the sidecar injector plugin to use. diff --git a/operator/src/api/preview/documentdb_types.go b/operator/src/api/preview/documentdb_types.go index 72185332..4d95ac9c 100644 --- a/operator/src/api/preview/documentdb_types.go +++ b/operator/src/api/preview/documentdb_types.go @@ -103,6 +103,31 @@ type DocumentDBSpec struct { // +kubebuilder:validation:XValidation:rule="self.all(key, key in ['ChangeStreams'])",message="unsupported feature gate key; allowed keys: ChangeStreams" FeatureGates map[string]bool `json:"featureGates,omitempty"` + // SchemaVersion controls the desired schema version for the DocumentDB extension. + // + // The operator never changes your database schema unless you ask: + // - Set documentDBVersion → updates the binary (safe to roll back) + // - Set schemaVersion → updates the database schema (irreversible) + // - Set schemaVersion: "auto" → schema auto-updates with binary + // + // Once the schema has been updated, the operator blocks image rollback below the + // installed schema version to prevent running an untested binary/schema combination. + // + // Values: + // - "" (empty, default): Two-phase mode. Image upgrades happen automatically, + // but ALTER EXTENSION UPDATE does NOT run. Users must explicitly set this + // field to finalize the schema upgrade. This is the safest option for production + // as it allows rollback by reverting the image before committing the schema change. + // - "auto": Schema automatically updates to match the binary version whenever + // the binary is upgraded. This is the simplest mode but provides no rollback + // safety window. + // - "" (e.g. "0.112.0"): Schema updates to exactly this version. + // Must be <= the binary version. + // + // +kubebuilder:validation:Pattern=`^(auto|[0-9]+\.[0-9]+\.[0-9]+)?$` + // +optional + SchemaVersion string `json:"schemaVersion,omitempty"` + // Affinity/Anti-affinity rules for Pods (cnpg passthrough) // +optional Affinity cnpgv1.AffinityConfiguration `json:"affinity,omitempty"` diff --git a/operator/src/config/crd/bases/documentdb.io_dbs.yaml b/operator/src/config/crd/bases/documentdb.io_dbs.yaml index ee2827f5..36191073 100644 --- a/operator/src/config/crd/bases/documentdb.io_dbs.yaml +++ b/operator/src/config/crd/bases/documentdb.io_dbs.yaml @@ -1264,6 +1264,25 @@ spec: required: - storage type: object + schemaVersion: + description: |- + SchemaVersion controls the desired schema version for the DocumentDB extension. + + This field decouples the extension binary (image) update from the schema update + (ALTER EXTENSION documentdb UPDATE), providing a rollback-safe upgrade window. + + Values: + - "" (empty, default): Two-phase mode. Image upgrades happen automatically, + but ALTER EXTENSION UPDATE does NOT run. Users must explicitly set this + field to finalize the schema upgrade. This is the safest option for production + as it allows rollback by reverting the image before committing the schema change. + - "auto": Schema automatically updates to match the binary version whenever + the binary is upgraded. This is the simplest mode but provides no rollback + safety window. Recommended for development and testing environments. + - "" (e.g. "0.112.0"): Schema updates to exactly this version. + Must be <= the binary version. + pattern: ^(auto|[0-9]+\.[0-9]+\.[0-9]+)?$ + type: string sidecarInjectorPluginName: description: SidecarInjectorPluginName is the name of the sidecar injector plugin to use. diff --git a/operator/src/internal/controller/documentdb_controller.go b/operator/src/internal/controller/documentdb_controller.go index 0b265ebc..71d8f081 100644 --- a/operator/src/internal/controller/documentdb_controller.go +++ b/operator/src/internal/controller/documentdb_controller.go @@ -835,8 +835,9 @@ func parseExtensionVersionsFromOutput(output string) (defaultVersion, installedV // upgradeDocumentDBIfNeeded handles the complete DocumentDB image upgrade process: // 1. Checks if extension image and/or gateway image need updating (builds a single JSON Patch) +// 1b. Blocks extension image rollback below the installed schema version (irreversible) // 2. If images changed, applies the patch atomically (triggers one CNPG rolling restart) and returns -// 3. After rolling restart completes, runs ALTER EXTENSION documentdb UPDATE if needed +// 3. After rolling restart completes, checks spec.schemaVersion to decide whether to run ALTER EXTENSION // 4. Updates the DocumentDB status with the new extension version func (r *DocumentDBReconciler) upgradeDocumentDBIfNeeded(ctx context.Context, currentCluster, desiredCluster *cnpgv1.Cluster, documentdb *dbpreview.DocumentDB) error { logger := log.FromContext(ctx) @@ -852,6 +853,42 @@ func (r *DocumentDBReconciler) upgradeDocumentDBIfNeeded(ctx context.Context, cu return fmt.Errorf("failed to build image patch operations: %w", err) } + // Step 1b: Block extension image rollback below installed schema version. + // Once ALTER EXTENSION UPDATE has run, the schema is irreversible. Running an older + // binary against a newer schema is untested and may cause data corruption. + if extensionUpdated && documentdb.Status.SchemaVersion != "" { + // Extract the version tag from the desired extension image reference + desiredExtImage := "" + for _, ext := range desiredCluster.Spec.PostgresConfiguration.Extensions { + if ext.Name == "documentdb" { + desiredExtImage = ext.ImageVolumeSource.Reference + break + } + } + if desiredExtImage != "" { + // Extract tag from image reference (e.g., "ghcr.io/.../documentdb:0.112.0" → "0.112.0") + if tagIdx := strings.LastIndex(desiredExtImage, ":"); tagIdx >= 0 { + newTag := desiredExtImage[tagIdx+1:] + // Convert both to extension format for comparison + newExtVersion := util.SemverToExtensionVersion(newTag) + schemaExtVersion := util.SemverToExtensionVersion(documentdb.Status.SchemaVersion) + cmp, err := util.CompareExtensionVersions(newExtVersion, schemaExtVersion) + if err == nil && cmp < 0 { + msg := fmt.Sprintf( + "Image rollback blocked: requested image version %s is older than installed schema version %s. "+ + "ALTER EXTENSION has no downgrade path — running an older binary with a newer schema may cause data corruption. "+ + "To recover, restore from backup or update to a version >= %s.", + newTag, documentdb.Status.SchemaVersion, documentdb.Status.SchemaVersion) + logger.Info(msg) + if r.Recorder != nil { + r.Recorder.Event(documentdb, corev1.EventTypeWarning, "ImageRollbackBlocked", msg) + } + return nil + } + } + } + } + // Step 2: Apply patch if any images need updating if len(patchOps) > 0 { patchBytes, err := json.Marshal(patchOps) @@ -951,7 +988,10 @@ func (r *DocumentDBReconciler) upgradeDocumentDBIfNeeded(ctx context.Context, cu return nil } - // Step 5b: Rollback detection — check if the new binary is older than the installed schema + // Step 5b: Rollback detection — secondary safety net. + // The primary rollback block is in Step 1b (prevents image patch when new image < schema). + // This handles edge cases where the binary version is already running but older than schema + // (e.g., if Step 1b was added after an earlier rollback already took effect). cmp, err := util.CompareExtensionVersions(defaultVersion, installedVersion) if err != nil { logger.Error(err, "Failed to compare extension versions, skipping ALTER EXTENSION as a safety measure", @@ -976,27 +1016,39 @@ func (r *DocumentDBReconciler) upgradeDocumentDBIfNeeded(ctx context.Context, cu return nil } - // Step 6: Run ALTER EXTENSION to upgrade (cmp > 0: defaultVersion > installedVersion) + // Step 6: Determine schema target based on spec.schemaVersion (two-phase upgrade logic) + // + // Three modes: + // - "" (empty): Two-phase mode — do NOT auto-update schema. User must explicitly + // set spec.schemaVersion to finalize. This provides a rollback-safe window. + // - "auto": Auto-finalize — update schema to match binary version automatically. + // - "": Explicit pin — update schema to exactly this version. + schemaTarget, updateSQL := r.determineSchemaTarget(ctx, documentdb, defaultVersion, installedVersion) + if schemaTarget == "" { + // Two-phase mode or validation failure — do not run ALTER EXTENSION + return nil + } + + // Step 7: Run ALTER EXTENSION to upgrade logger.Info("Upgrading DocumentDB extension", "fromVersion", installedVersion, - "toVersion", defaultVersion) + "toVersion", schemaTarget) - updateSQL := "ALTER EXTENSION documentdb UPDATE" if _, err := r.SQLExecutor(ctx, currentCluster, updateSQL); err != nil { return fmt.Errorf("failed to run ALTER EXTENSION documentdb UPDATE: %w", err) } logger.Info("Successfully upgraded DocumentDB extension", "fromVersion", installedVersion, - "toVersion", defaultVersion) + "toVersion", schemaTarget) - // Step 7: Update DocumentDB schema version in status after upgrade + // Step 8: Update DocumentDB schema version in status after upgrade // Re-fetch to get latest resourceVersion before status update if err := r.Get(ctx, types.NamespacedName{Name: documentdb.Name, Namespace: documentdb.Namespace}, documentdb); err != nil { return fmt.Errorf("failed to refetch DocumentDB after schema upgrade: %w", err) } // Convert from pg_available_extensions format ("0.110-0") to semver ("0.110.0") - documentdb.Status.SchemaVersion = util.ExtensionVersionToSemver(defaultVersion) + documentdb.Status.SchemaVersion = util.ExtensionVersionToSemver(schemaTarget) if err := r.Status().Update(ctx, documentdb); err != nil { logger.Error(err, "Failed to update DocumentDB status after schema upgrade") return fmt.Errorf("failed to update DocumentDB status after schema upgrade: %w", err) @@ -1005,6 +1057,84 @@ func (r *DocumentDBReconciler) upgradeDocumentDBIfNeeded(ctx context.Context, cu return nil } +// determineSchemaTarget decides the target schema version based on spec.schemaVersion. +// Returns the target version (in pg_available_extensions format) and the SQL to execute. +// Returns ("", "") if no schema update should run (two-phase mode or validation failure). +func (r *DocumentDBReconciler) determineSchemaTarget( + ctx context.Context, + documentdb *dbpreview.DocumentDB, + binaryVersion string, + installedVersion string, +) (string, string) { + logger := log.FromContext(ctx) + specSchemaVersion := documentdb.Spec.SchemaVersion + + switch { + case specSchemaVersion == "": + // Two-phase mode: schema stays at current version until user explicitly sets schemaVersion + logger.Info("Schema update available but not requested (two-phase mode). "+ + "Set spec.schemaVersion to finalize the upgrade.", + "binaryVersion", binaryVersion, + "installedVersion", installedVersion) + if r.Recorder != nil { + msg := fmt.Sprintf( + "Schema update available: binary version is %s, schema is at %s. "+ + "Set spec.schemaVersion to %q or \"auto\" to finalize the upgrade.", + util.ExtensionVersionToSemver(binaryVersion), + util.ExtensionVersionToSemver(installedVersion), + util.ExtensionVersionToSemver(binaryVersion)) + r.Recorder.Event(documentdb, corev1.EventTypeNormal, "SchemaUpdateAvailable", msg) + } + return "", "" + + case specSchemaVersion == "auto": + // Auto-finalize: update schema to match binary version + return binaryVersion, "ALTER EXTENSION documentdb UPDATE" + + default: + // Explicit version: validate and update to that specific version + // Convert semver ("0.112.0") to pg extension format ("0.112-0") for comparison + targetPgVersion := util.SemverToExtensionVersion(specSchemaVersion) + + // Validate: target must not exceed binary version + targetCmp, err := util.CompareExtensionVersions(targetPgVersion, binaryVersion) + if err != nil { + logger.Error(err, "Failed to compare target schema version with binary version", + "targetVersion", specSchemaVersion, + "binaryVersion", binaryVersion) + return "", "" + } + if targetCmp > 0 { + msg := fmt.Sprintf( + "Requested schema version %s exceeds binary version %s. "+ + "Schema version must be <= binary version. Skipping schema update.", + specSchemaVersion, util.ExtensionVersionToSemver(binaryVersion)) + logger.Info(msg) + if r.Recorder != nil { + r.Recorder.Event(documentdb, corev1.EventTypeWarning, "SchemaVersionExceedsBinary", msg) + } + return "", "" + } + + // Validate: target must be > installed version (otherwise no-op) + installedCmp, err := util.CompareExtensionVersions(targetPgVersion, installedVersion) + if err != nil { + logger.Error(err, "Failed to compare target schema version with installed version", + "targetVersion", specSchemaVersion, + "installedVersion", installedVersion) + return "", "" + } + if installedCmp <= 0 { + logger.V(1).Info("Schema already at or beyond requested version", + "requestedVersion", specSchemaVersion, + "installedVersion", installedVersion) + return "", "" + } + + return targetPgVersion, fmt.Sprintf("ALTER EXTENSION documentdb UPDATE TO '%s'", targetPgVersion) + } +} + // buildImagePatchOps compares the current and desired CNPG cluster specs and returns // JSON Patch operations for any image differences (extension image settings and/or gateway image). // This is a pure function with no API calls. Returns: diff --git a/operator/src/internal/controller/documentdb_controller_test.go b/operator/src/internal/controller/documentdb_controller_test.go index 59ef63b5..3f60fddf 100644 --- a/operator/src/internal/controller/documentdb_controller_test.go +++ b/operator/src/internal/controller/documentdb_controller_test.go @@ -1696,6 +1696,9 @@ var _ = Describe("DocumentDB Controller", func() { Name: "test-documentdb", Namespace: clusterNamespace, }, + Spec: dbpreview.DocumentDBSpec{ + SchemaVersion: "auto", + }, } fakeClient := fake.NewClientBuilder(). @@ -1767,6 +1770,9 @@ var _ = Describe("DocumentDB Controller", func() { Name: "test-documentdb", Namespace: clusterNamespace, }, + Spec: dbpreview.DocumentDBSpec{ + SchemaVersion: "auto", + }, } fakeClient := fake.NewClientBuilder(). @@ -2241,6 +2247,9 @@ var _ = Describe("DocumentDB Controller", func() { Name: "test-documentdb", Namespace: clusterNamespace, }, + Spec: dbpreview.DocumentDBSpec{ + SchemaVersion: "auto", + }, Status: dbpreview.DocumentDBStatus{ // Images match cluster so updateImageStatus is a no-op DocumentDBImage: "documentdb/documentdb:v1.0.0", @@ -2282,6 +2291,432 @@ var _ = Describe("DocumentDB Controller", func() { }) }) + Describe("two-phase upgrade (spec.schemaVersion)", func() { + It("should skip ALTER EXTENSION when schemaVersion is empty (two-phase mode)", func() { + cluster := &cnpgv1.Cluster{ + ObjectMeta: metav1.ObjectMeta{ + Name: clusterName, + Namespace: clusterNamespace, + }, + Spec: cnpgv1.ClusterSpec{ + PostgresConfiguration: cnpgv1.PostgresConfiguration{ + Extensions: []cnpgv1.ExtensionConfiguration{ + { + Name: "documentdb", + ImageVolumeSource: corev1.ImageVolumeSource{ + Reference: "documentdb/documentdb:v1.0.0", + }, + }, + }, + }, + }, + Status: cnpgv1.ClusterStatus{ + CurrentPrimary: "test-cluster-1", + InstancesStatus: map[cnpgv1.PodStatus][]string{ + cnpgv1.PodHealthy: {"test-cluster-1"}, + }, + }, + } + + desiredCluster := cluster.DeepCopy() + + // schemaVersion is empty (default) → two-phase mode + documentdb := &dbpreview.DocumentDB{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-documentdb", + Namespace: clusterNamespace, + }, + } + + fakeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(cluster, documentdb). + WithStatusSubresource(&dbpreview.DocumentDB{}). + Build() + + sqlCalls := []string{} + reconciler := &DocumentDBReconciler{ + Client: fakeClient, + Scheme: scheme, + Recorder: recorder, + SQLExecutor: func(_ context.Context, _ *cnpgv1.Cluster, sql string) (string, error) { + sqlCalls = append(sqlCalls, sql) + // Binary is 0.110-0, installed is 0.109-0 → upgrade available + return " default_version | installed_version \n-----------------+-------------------\n 0.110-0 | 0.109-0 \n", nil + }, + } + + err := reconciler.upgradeDocumentDBIfNeeded(ctx, cluster, desiredCluster, documentdb) + Expect(err).ToNot(HaveOccurred()) + + // Only version-check SQL should have been called (no ALTER EXTENSION) + Expect(sqlCalls).To(HaveLen(1)) + Expect(sqlCalls[0]).To(ContainSubstring("pg_available_extensions")) + + // Status should reflect installed version, NOT the binary version + updatedDB := &dbpreview.DocumentDB{} + Expect(fakeClient.Get(ctx, types.NamespacedName{Name: "test-documentdb", Namespace: clusterNamespace}, updatedDB)).To(Succeed()) + Expect(updatedDB.Status.SchemaVersion).To(Equal("0.109.0")) + }) + + It("should run ALTER EXTENSION when schemaVersion is 'auto'", func() { + cluster := &cnpgv1.Cluster{ + ObjectMeta: metav1.ObjectMeta{ + Name: clusterName, + Namespace: clusterNamespace, + }, + Spec: cnpgv1.ClusterSpec{ + PostgresConfiguration: cnpgv1.PostgresConfiguration{ + Extensions: []cnpgv1.ExtensionConfiguration{ + { + Name: "documentdb", + ImageVolumeSource: corev1.ImageVolumeSource{ + Reference: "documentdb/documentdb:v1.0.0", + }, + }, + }, + }, + }, + Status: cnpgv1.ClusterStatus{ + CurrentPrimary: "test-cluster-1", + InstancesStatus: map[cnpgv1.PodStatus][]string{ + cnpgv1.PodHealthy: {"test-cluster-1"}, + }, + }, + } + + desiredCluster := cluster.DeepCopy() + + documentdb := &dbpreview.DocumentDB{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-documentdb", + Namespace: clusterNamespace, + }, + Spec: dbpreview.DocumentDBSpec{ + SchemaVersion: "auto", + }, + } + + fakeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(cluster, documentdb). + WithStatusSubresource(&dbpreview.DocumentDB{}). + Build() + + sqlCalls := []string{} + reconciler := &DocumentDBReconciler{ + Client: fakeClient, + Scheme: scheme, + Recorder: recorder, + SQLExecutor: func(_ context.Context, _ *cnpgv1.Cluster, sql string) (string, error) { + sqlCalls = append(sqlCalls, sql) + if len(sqlCalls) == 1 { + return " default_version | installed_version \n-----------------+-------------------\n 0.110-0 | 0.109-0 \n", nil + } + return "ALTER EXTENSION", nil + }, + } + + err := reconciler.upgradeDocumentDBIfNeeded(ctx, cluster, desiredCluster, documentdb) + Expect(err).ToNot(HaveOccurred()) + + // Both version-check and ALTER EXTENSION should have been called + Expect(sqlCalls).To(HaveLen(2)) + Expect(sqlCalls[0]).To(ContainSubstring("pg_available_extensions")) + Expect(sqlCalls[1]).To(Equal("ALTER EXTENSION documentdb UPDATE")) + + // Status should reflect the upgraded version + updatedDB := &dbpreview.DocumentDB{} + Expect(fakeClient.Get(ctx, types.NamespacedName{Name: "test-documentdb", Namespace: clusterNamespace}, updatedDB)).To(Succeed()) + Expect(updatedDB.Status.SchemaVersion).To(Equal("0.110.0")) + }) + + It("should run ALTER EXTENSION UPDATE TO specific version when schemaVersion is explicit", func() { + cluster := &cnpgv1.Cluster{ + ObjectMeta: metav1.ObjectMeta{ + Name: clusterName, + Namespace: clusterNamespace, + }, + Spec: cnpgv1.ClusterSpec{ + PostgresConfiguration: cnpgv1.PostgresConfiguration{ + Extensions: []cnpgv1.ExtensionConfiguration{ + { + Name: "documentdb", + ImageVolumeSource: corev1.ImageVolumeSource{ + Reference: "documentdb/documentdb:v1.0.0", + }, + }, + }, + }, + }, + Status: cnpgv1.ClusterStatus{ + CurrentPrimary: "test-cluster-1", + InstancesStatus: map[cnpgv1.PodStatus][]string{ + cnpgv1.PodHealthy: {"test-cluster-1"}, + }, + }, + } + + desiredCluster := cluster.DeepCopy() + + // Explicit version: update to 0.110.0 (binary is also 0.110-0) + documentdb := &dbpreview.DocumentDB{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-documentdb", + Namespace: clusterNamespace, + }, + Spec: dbpreview.DocumentDBSpec{ + SchemaVersion: "0.110.0", + }, + } + + fakeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(cluster, documentdb). + WithStatusSubresource(&dbpreview.DocumentDB{}). + Build() + + sqlCalls := []string{} + reconciler := &DocumentDBReconciler{ + Client: fakeClient, + Scheme: scheme, + Recorder: recorder, + SQLExecutor: func(_ context.Context, _ *cnpgv1.Cluster, sql string) (string, error) { + sqlCalls = append(sqlCalls, sql) + if len(sqlCalls) == 1 { + // Binary is 0.110-0, installed is 0.109-0 + return " default_version | installed_version \n-----------------+-------------------\n 0.110-0 | 0.109-0 \n", nil + } + return "ALTER EXTENSION", nil + }, + } + + err := reconciler.upgradeDocumentDBIfNeeded(ctx, cluster, desiredCluster, documentdb) + Expect(err).ToNot(HaveOccurred()) + + // Should run ALTER EXTENSION UPDATE TO specific version + Expect(sqlCalls).To(HaveLen(2)) + Expect(sqlCalls[0]).To(ContainSubstring("pg_available_extensions")) + Expect(sqlCalls[1]).To(Equal("ALTER EXTENSION documentdb UPDATE TO '0.110-0'")) + + // Status should reflect the explicit version + updatedDB := &dbpreview.DocumentDB{} + Expect(fakeClient.Get(ctx, types.NamespacedName{Name: "test-documentdb", Namespace: clusterNamespace}, updatedDB)).To(Succeed()) + Expect(updatedDB.Status.SchemaVersion).To(Equal("0.110.0")) + }) + + It("should emit warning and skip when explicit schemaVersion exceeds binary version", func() { + cluster := &cnpgv1.Cluster{ + ObjectMeta: metav1.ObjectMeta{ + Name: clusterName, + Namespace: clusterNamespace, + }, + Spec: cnpgv1.ClusterSpec{ + PostgresConfiguration: cnpgv1.PostgresConfiguration{ + Extensions: []cnpgv1.ExtensionConfiguration{ + { + Name: "documentdb", + ImageVolumeSource: corev1.ImageVolumeSource{ + Reference: "documentdb/documentdb:v1.0.0", + }, + }, + }, + }, + }, + Status: cnpgv1.ClusterStatus{ + CurrentPrimary: "test-cluster-1", + InstancesStatus: map[cnpgv1.PodStatus][]string{ + cnpgv1.PodHealthy: {"test-cluster-1"}, + }, + }, + } + + desiredCluster := cluster.DeepCopy() + + // Explicit version 0.112.0 exceeds binary version 0.110-0 + documentdb := &dbpreview.DocumentDB{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-documentdb", + Namespace: clusterNamespace, + }, + Spec: dbpreview.DocumentDBSpec{ + SchemaVersion: "0.112.0", + }, + } + + fakeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(cluster, documentdb). + WithStatusSubresource(&dbpreview.DocumentDB{}). + Build() + + sqlCalls := []string{} + reconciler := &DocumentDBReconciler{ + Client: fakeClient, + Scheme: scheme, + Recorder: recorder, + SQLExecutor: func(_ context.Context, _ *cnpgv1.Cluster, sql string) (string, error) { + sqlCalls = append(sqlCalls, sql) + // Binary is 0.110-0, installed is 0.109-0 + return " default_version | installed_version \n-----------------+-------------------\n 0.110-0 | 0.109-0 \n", nil + }, + } + + err := reconciler.upgradeDocumentDBIfNeeded(ctx, cluster, desiredCluster, documentdb) + Expect(err).ToNot(HaveOccurred()) + + // Only version-check SQL should have been called (no ALTER EXTENSION) + Expect(sqlCalls).To(HaveLen(1)) + + // Status should reflect installed version (not upgraded) + updatedDB := &dbpreview.DocumentDB{} + Expect(fakeClient.Get(ctx, types.NamespacedName{Name: "test-documentdb", Namespace: clusterNamespace}, updatedDB)).To(Succeed()) + Expect(updatedDB.Status.SchemaVersion).To(Equal("0.109.0")) + }) + + It("should skip when explicit schemaVersion matches installed version", func() { + cluster := &cnpgv1.Cluster{ + ObjectMeta: metav1.ObjectMeta{ + Name: clusterName, + Namespace: clusterNamespace, + }, + Spec: cnpgv1.ClusterSpec{ + PostgresConfiguration: cnpgv1.PostgresConfiguration{ + Extensions: []cnpgv1.ExtensionConfiguration{ + { + Name: "documentdb", + ImageVolumeSource: corev1.ImageVolumeSource{ + Reference: "documentdb/documentdb:v1.0.0", + }, + }, + }, + }, + }, + Status: cnpgv1.ClusterStatus{ + CurrentPrimary: "test-cluster-1", + InstancesStatus: map[cnpgv1.PodStatus][]string{ + cnpgv1.PodHealthy: {"test-cluster-1"}, + }, + }, + } + + desiredCluster := cluster.DeepCopy() + + // Explicit version matches installed → no-op + documentdb := &dbpreview.DocumentDB{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-documentdb", + Namespace: clusterNamespace, + }, + Spec: dbpreview.DocumentDBSpec{ + SchemaVersion: "0.109.0", + }, + } + + fakeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(cluster, documentdb). + WithStatusSubresource(&dbpreview.DocumentDB{}). + Build() + + sqlCalls := []string{} + reconciler := &DocumentDBReconciler{ + Client: fakeClient, + Scheme: scheme, + Recorder: recorder, + SQLExecutor: func(_ context.Context, _ *cnpgv1.Cluster, sql string) (string, error) { + sqlCalls = append(sqlCalls, sql) + // Binary is 0.110-0, installed is 0.109-0 + return " default_version | installed_version \n-----------------+-------------------\n 0.110-0 | 0.109-0 \n", nil + }, + } + + err := reconciler.upgradeDocumentDBIfNeeded(ctx, cluster, desiredCluster, documentdb) + Expect(err).ToNot(HaveOccurred()) + + // Only version-check SQL should have been called (no ALTER EXTENSION) + Expect(sqlCalls).To(HaveLen(1)) + }) + + It("should block image rollback when new image version is below installed schema version", func() { + // Current cluster has old image + cluster := &cnpgv1.Cluster{ + ObjectMeta: metav1.ObjectMeta{ + Name: clusterName, + Namespace: clusterNamespace, + }, + Spec: cnpgv1.ClusterSpec{ + PostgresConfiguration: cnpgv1.PostgresConfiguration{ + Extensions: []cnpgv1.ExtensionConfiguration{ + { + Name: "documentdb", + ImageVolumeSource: corev1.ImageVolumeSource{ + Reference: "ghcr.io/documentdb/documentdb:0.110.0", + }, + }, + }, + }, + }, + Status: cnpgv1.ClusterStatus{ + CurrentPrimary: "test-cluster-1", + InstancesStatus: map[cnpgv1.PodStatus][]string{ + cnpgv1.PodHealthy: {"test-cluster-1"}, + }, + }, + } + + // Desired cluster has OLDER image (rollback attempt) + desiredCluster := cluster.DeepCopy() + desiredCluster.Spec.PostgresConfiguration.Extensions[0].ImageVolumeSource.Reference = "ghcr.io/documentdb/documentdb:0.109.0" + + // Schema is already at 0.110.0 (ALTER EXTENSION was previously applied) + documentdb := &dbpreview.DocumentDB{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-documentdb", + Namespace: clusterNamespace, + }, + Status: dbpreview.DocumentDBStatus{ + SchemaVersion: "0.110.0", + }, + } + + fakeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(cluster, documentdb). + WithStatusSubresource(&dbpreview.DocumentDB{}). + Build() + + sqlCallCount := 0 + reconciler := &DocumentDBReconciler{ + Client: fakeClient, + Scheme: scheme, + Recorder: recorder, + SQLExecutor: func(_ context.Context, _ *cnpgv1.Cluster, sql string) (string, error) { + sqlCallCount++ + return "", nil + }, + } + + err := reconciler.upgradeDocumentDBIfNeeded(ctx, cluster, desiredCluster, documentdb) + Expect(err).ToNot(HaveOccurred()) + + // No SQL should have been called — image patch was blocked + Expect(sqlCallCount).To(Equal(0)) + + // Verify the CNPG cluster was NOT patched (extension image unchanged) + updatedCluster := &cnpgv1.Cluster{} + Expect(fakeClient.Get(ctx, types.NamespacedName{Name: clusterName, Namespace: clusterNamespace}, updatedCluster)).To(Succeed()) + Expect(updatedCluster.Spec.PostgresConfiguration.Extensions[0].ImageVolumeSource.Reference).To(Equal("ghcr.io/documentdb/documentdb:0.110.0")) + + // Verify ImageRollbackBlocked warning event was emitted + Expect(recorder.Events).To(HaveLen(1)) + event := <-recorder.Events + Expect(event).To(ContainSubstring("ImageRollbackBlocked")) + Expect(event).To(ContainSubstring("0.109.0")) + Expect(event).To(ContainSubstring("0.110.0")) + }) + }) + Describe("updateImageStatus", func() { It("should set DocumentDBImage and GatewayImage from cluster spec", func() { cluster := &cnpgv1.Cluster{ diff --git a/operator/src/internal/utils/util.go b/operator/src/internal/utils/util.go index fec8852f..751d2bce 100644 --- a/operator/src/internal/utils/util.go +++ b/operator/src/internal/utils/util.go @@ -510,6 +510,16 @@ func ExtensionVersionToSemver(v string) string { return v } +// SemverToExtensionVersion converts a semver string (e.g., "0.110.0") to the +// PostgreSQL extension version format (e.g., "0.110-0") used by pg_available_extensions. +// This is the inverse of ExtensionVersionToSemver. +func SemverToExtensionVersion(v string) string { + if idx := strings.LastIndex(v, "."); idx >= 0 { + return v[:idx] + "-" + v[idx+1:] + } + return v +} + // CompareExtensionVersions compares two DocumentDB extension version strings. // Format: "Major.Minor-Patch" (e.g., "0.110-0"). // Returns: -1 if v1 < v2, 0 if equal, +1 if v1 > v2. diff --git a/operator/src/internal/utils/util_test.go b/operator/src/internal/utils/util_test.go index c10f019b..ab0da533 100644 --- a/operator/src/internal/utils/util_test.go +++ b/operator/src/internal/utils/util_test.go @@ -1125,3 +1125,28 @@ func TestExtensionVersionToSemver(t *testing.T) { }) } } + +func TestSemverToExtensionVersion(t *testing.T) { + tests := []struct { + name string + input string + expected string + }{ + {name: "standard semver", input: "0.110.0", expected: "0.110-0"}, + {name: "non-zero patch", input: "0.110.1", expected: "0.110-1"}, + {name: "major version", input: "1.0.0", expected: "1.0-0"}, + {name: "all non-zero", input: "2.15.3", expected: "2.15-3"}, + {name: "large numbers", input: "100.999.50", expected: "100.999-50"}, + {name: "already extension format (no dot)", input: "110", expected: "110"}, + {name: "empty string", input: "", expected: ""}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + result := SemverToExtensionVersion(tt.input) + if result != tt.expected { + t.Errorf("SemverToExtensionVersion(%q) = %q, want %q", tt.input, result, tt.expected) + } + }) + } +} From 4e240062d6d692ab84b3294c2265a17e704431d7 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Thu, 26 Mar 2026 14:47:48 -0400 Subject: [PATCH 02/13] feat: add validating webhook for DocumentDB version safety Add a ValidatingWebhookConfiguration that enforces: - schemaVersion must be <= binary version (on create and update) - Image rollback below installed schema version is blocked (on update) Components added: - internal/webhook/documentdb_webhook.go: ValidateCreate/ValidateUpdate handlers - internal/webhook/documentdb_webhook_test.go: 18 unit tests - Helm template 10_documentdb_webhook.yaml: Issuer, Certificate, Service, ValidatingWebhookConfiguration with cert-manager CA injection - Updated 09_documentdb_operator.yaml: webhook port, cert volume mount, args - Updated cmd/main.go: webhook registration The webhook runs inside the existing operator process on port 9443 using cert-manager for TLS (same pattern as the sidecar injector). failurePolicy is set to Fail for database safety. The controller retains defense-in-depth checks as a secondary safety net. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu --- .../templates/09_documentdb_operator.yaml | 15 ++ .../templates/10_documentdb_webhook.yaml | 88 +++++++ operator/src/cmd/main.go | 7 + .../internal/webhook/documentdb_webhook.go | 172 ++++++++++++ .../webhook/documentdb_webhook_test.go | 245 ++++++++++++++++++ 5 files changed, 527 insertions(+) create mode 100644 operator/documentdb-helm-chart/templates/10_documentdb_webhook.yaml create mode 100644 operator/src/internal/webhook/documentdb_webhook.go create mode 100644 operator/src/internal/webhook/documentdb_webhook_test.go diff --git a/operator/documentdb-helm-chart/templates/09_documentdb_operator.yaml b/operator/documentdb-helm-chart/templates/09_documentdb_operator.yaml index 59a24c66..80b914e1 100644 --- a/operator/documentdb-helm-chart/templates/09_documentdb_operator.yaml +++ b/operator/documentdb-helm-chart/templates/09_documentdb_operator.yaml @@ -22,6 +22,16 @@ spec: - name: documentdb-operator image: "{{ .Values.image.documentdbk8soperator.repository }}:{{ .Values.image.documentdbk8soperator.tag | default .Chart.AppVersion }}" imagePullPolicy: "{{ .Values.image.documentdbk8soperator.pullPolicy }}" + args: + - --webhook-cert-path=/tmp/k8s-webhook-server/serving-certs + ports: + - containerPort: 9443 + name: webhook-server + protocol: TCP + volumeMounts: + - mountPath: /tmp/k8s-webhook-server/serving-certs + name: webhook-cert + readOnly: true env: - name: GATEWAY_PORT value: "10260" @@ -37,3 +47,8 @@ spec: - name: DOCUMENTDB_IMAGE_PULL_POLICY value: "{{ .Values.documentDbImagePullPolicy }}" {{- end }} + volumes: + - name: webhook-cert + secret: + secretName: documentdb-webhook-tls + defaultMode: 420 diff --git a/operator/documentdb-helm-chart/templates/10_documentdb_webhook.yaml b/operator/documentdb-helm-chart/templates/10_documentdb_webhook.yaml new file mode 100644 index 00000000..0801c1f5 --- /dev/null +++ b/operator/documentdb-helm-chart/templates/10_documentdb_webhook.yaml @@ -0,0 +1,88 @@ +{{- $ns := .Values.namespace | default .Release.Namespace -}} +# Self-signed Issuer for webhook TLS certificates in the operator namespace. +# The sidecar injector uses a separate Issuer in cnpg-system; this one is for the operator. +apiVersion: cert-manager.io/v1 +kind: Issuer +metadata: + name: documentdb-operator-selfsigned-issuer + namespace: {{ $ns }} + labels: + app.kubernetes.io/name: {{ include "documentdb-chart.name" . }} + app.kubernetes.io/managed-by: "Helm" +spec: + selfSigned: {} +--- +# TLS certificate for the validating webhook server. +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: documentdb-webhook-cert + namespace: {{ $ns }} + labels: + app.kubernetes.io/name: {{ include "documentdb-chart.name" . }} + app.kubernetes.io/managed-by: "Helm" +spec: + commonName: documentdb-webhook + dnsNames: + - documentdb-webhook-service.{{ $ns }}.svc + - documentdb-webhook-service.{{ $ns }}.svc.cluster.local + duration: 2160h # 90 days + renewBefore: 360h # 15 days + isCA: false + issuerRef: + group: cert-manager.io + kind: Issuer + name: documentdb-operator-selfsigned-issuer + secretName: documentdb-webhook-tls + usages: + - server auth +--- +# Service fronting the operator's webhook server on port 9443. +apiVersion: v1 +kind: Service +metadata: + name: documentdb-webhook-service + namespace: {{ $ns }} + labels: + app.kubernetes.io/name: {{ include "documentdb-chart.name" . }} + app.kubernetes.io/managed-by: "Helm" +spec: + ports: + - port: 443 + protocol: TCP + targetPort: 9443 + selector: + app: {{ .Release.Name }} +--- +# ValidatingWebhookConfiguration for DocumentDB resources. +# cert-manager injects the CA bundle automatically via the annotation. +apiVersion: admissionregistration.k8s.io/v1 +kind: ValidatingWebhookConfiguration +metadata: + name: documentdb-validating-webhook + labels: + app.kubernetes.io/name: {{ include "documentdb-chart.name" . }} + app.kubernetes.io/managed-by: "Helm" + annotations: + cert-manager.io/inject-ca-from: {{ $ns }}/documentdb-webhook-cert +webhooks: + - name: vdocumentdb.kb.io + admissionReviewVersions: + - v1 + clientConfig: + service: + name: documentdb-webhook-service + namespace: {{ $ns }} + path: /validate-documentdb-io-preview-documentdb + failurePolicy: Fail + rules: + - apiGroups: + - documentdb.io + apiVersions: + - preview + operations: + - CREATE + - UPDATE + resources: + - dbs + sideEffects: None diff --git a/operator/src/cmd/main.go b/operator/src/cmd/main.go index bf5f2ddf..db1bca93 100644 --- a/operator/src/cmd/main.go +++ b/operator/src/cmd/main.go @@ -29,6 +29,7 @@ import ( cnpgv1 "github.com/cloudnative-pg/cloudnative-pg/api/v1" dbpreview "github.com/documentdb/documentdb-operator/api/preview" "github.com/documentdb/documentdb-operator/internal/controller" + webhookhandler "github.com/documentdb/documentdb-operator/internal/webhook" fleetv1alpha1 "go.goms.io/fleet-networking/api/v1alpha1" // +kubebuilder:scaffold:imports ) @@ -248,6 +249,12 @@ func main() { // +kubebuilder:scaffold:builder + // Register the DocumentDB validating webhook + if err = (&webhookhandler.DocumentDBValidator{}).SetupWebhookWithManager(mgr); err != nil { + setupLog.Error(err, "unable to create webhook", "webhook", "DocumentDB") + os.Exit(1) + } + if metricsCertWatcher != nil { setupLog.Info("Adding metrics certificate watcher to manager") if err := mgr.Add(metricsCertWatcher); err != nil { diff --git a/operator/src/internal/webhook/documentdb_webhook.go b/operator/src/internal/webhook/documentdb_webhook.go new file mode 100644 index 00000000..12218215 --- /dev/null +++ b/operator/src/internal/webhook/documentdb_webhook.go @@ -0,0 +1,172 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +package webhook + +import ( + "context" + "fmt" + "strings" + + "k8s.io/apimachinery/pkg/runtime" + "k8s.io/apimachinery/pkg/util/validation/field" + ctrl "sigs.k8s.io/controller-runtime" + "sigs.k8s.io/controller-runtime/pkg/client" + logf "sigs.k8s.io/controller-runtime/pkg/log" + "sigs.k8s.io/controller-runtime/pkg/webhook/admission" + + dbpreview "github.com/documentdb/documentdb-operator/api/preview" + util "github.com/documentdb/documentdb-operator/internal/utils" +) + +var log = logf.Log.WithName("documentdb-webhook") + +// DocumentDBValidator validates DocumentDB resources on create and update. +type DocumentDBValidator struct { + client.Client +} + +// SetupWebhookWithManager registers the validating webhook with the manager. +func (v *DocumentDBValidator) SetupWebhookWithManager(mgr ctrl.Manager) error { + v.Client = mgr.GetClient() + return ctrl.NewWebhookManagedBy(mgr). + For(&dbpreview.DocumentDB{}). + WithValidator(v). + Complete() +} + +// +kubebuilder:webhook:path=/validate-documentdb-io-preview-documentdb,mutating=false,failurePolicy=fail,sideEffects=None,groups=documentdb.io,resources=dbs,verbs=create;update,versions=preview,name=vdocumentdb.kb.io,admissionReviewVersions=v1 + +// ValidateCreate validates a DocumentDB resource on creation. +func (v *DocumentDBValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) { + documentdb, ok := obj.(*dbpreview.DocumentDB) + if !ok { + return nil, fmt.Errorf("expected DocumentDB but got %T", obj) + } + log.Info("validating create", "name", documentdb.Name) + + return validateSpec(documentdb) +} + +// ValidateUpdate validates a DocumentDB resource on update. +func (v *DocumentDBValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) { + newDB, ok := newObj.(*dbpreview.DocumentDB) + if !ok { + return nil, fmt.Errorf("expected DocumentDB but got %T", newObj) + } + oldDB, ok := oldObj.(*dbpreview.DocumentDB) + if !ok { + return nil, fmt.Errorf("expected DocumentDB but got %T", oldObj) + } + log.Info("validating update", "name", newDB.Name) + + warnings, err := validateSpec(newDB) + if err != nil { + return warnings, err + } + + // Validate image rollback is not below installed schema version. + // This is the primary enforcement point — blocks the spec change before it is persisted. + if rollbackErr := validateImageRollback(oldDB, newDB); rollbackErr != nil { + return warnings, rollbackErr + } + + return warnings, nil +} + +// ValidateDelete is a no-op for DocumentDB. +func (v *DocumentDBValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) { + return nil, nil +} + +// validateSpec checks spec-level invariants that apply to both create and update. +func validateSpec(db *dbpreview.DocumentDB) (admission.Warnings, error) { + var allErrs field.ErrorList + specPath := field.NewPath("spec") + + // If schemaVersion is set to an explicit version (not empty, not "auto"), + // it must be <= the resolved binary version. + if db.Spec.SchemaVersion != "" && db.Spec.SchemaVersion != "auto" { + binaryVersion := resolveBinaryVersion(db) + if binaryVersion != "" { + if err := validateSchemaNotExceedsBinary(db.Spec.SchemaVersion, binaryVersion, specPath); err != nil { + allErrs = append(allErrs, err) + } + } + } + + if len(allErrs) > 0 { + return nil, allErrs.ToAggregate() + } + return nil, nil +} + +// validateSchemaNotExceedsBinary ensures schemaVersion <= binaryVersion. +func validateSchemaNotExceedsBinary(schemaVersion, binaryVersion string, specPath *field.Path) *field.Error { + schemaPg := util.SemverToExtensionVersion(schemaVersion) + binaryPg := util.SemverToExtensionVersion(binaryVersion) + + cmp, err := util.CompareExtensionVersions(schemaPg, binaryPg) + if err != nil { + // If we can't parse versions, let it through — controller will catch it + return nil + } + if cmp > 0 { + return field.Invalid( + specPath.Child("schemaVersion"), + schemaVersion, + fmt.Sprintf("schemaVersion %s exceeds the binary version %s; schema version must be <= binary version", schemaVersion, binaryVersion), + ) + } + return nil +} + +// validateImageRollback blocks image downgrades below the installed schema version. +// Once ALTER EXTENSION UPDATE has run, the schema is irreversible. Running an older +// binary against a newer schema is untested and may cause data corruption. +func validateImageRollback(oldDB, newDB *dbpreview.DocumentDB) error { + installedSchemaVersion := oldDB.Status.SchemaVersion + if installedSchemaVersion == "" { + return nil + } + + // Determine the new binary version from the updated spec + newBinaryVersion := resolveBinaryVersion(newDB) + if newBinaryVersion == "" { + return nil + } + + // Compare: newBinaryVersion must be >= installedSchemaVersion + newPg := util.SemverToExtensionVersion(newBinaryVersion) + schemaPg := util.SemverToExtensionVersion(installedSchemaVersion) + + cmp, err := util.CompareExtensionVersions(newPg, schemaPg) + if err != nil { + // Can't parse — let it through, controller has defense-in-depth + return nil + } + if cmp < 0 { + return field.Forbidden( + field.NewPath("spec"), + fmt.Sprintf( + "image rollback blocked: requested version %s is older than installed schema version %s. "+ + "ALTER EXTENSION has no downgrade path — running an older binary with a newer schema may cause data corruption. "+ + "To recover, restore from backup or update to a version >= %s.", + newBinaryVersion, installedSchemaVersion, installedSchemaVersion), + ) + } + return nil +} + +// resolveBinaryVersion extracts the effective binary version from a DocumentDB spec. +// Priority: documentDBImage tag > documentDBVersion > "" (unknown). +func resolveBinaryVersion(db *dbpreview.DocumentDB) string { + // If explicit image is set, extract tag + if db.Spec.DocumentDBImage != "" { + if tagIdx := strings.LastIndex(db.Spec.DocumentDBImage, ":"); tagIdx >= 0 { + return db.Spec.DocumentDBImage[tagIdx+1:] + } + } + // Fall back to documentDBVersion + return db.Spec.DocumentDBVersion +} diff --git a/operator/src/internal/webhook/documentdb_webhook_test.go b/operator/src/internal/webhook/documentdb_webhook_test.go new file mode 100644 index 00000000..f7535b1a --- /dev/null +++ b/operator/src/internal/webhook/documentdb_webhook_test.go @@ -0,0 +1,245 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +package webhook + +import ( + "context" + "testing" + + dbpreview "github.com/documentdb/documentdb-operator/api/preview" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" +) + +func newTestDocumentDB(version, schemaVersion, image string) *dbpreview.DocumentDB { + db := &dbpreview.DocumentDB{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-db", + Namespace: "default", + }, + Spec: dbpreview.DocumentDBSpec{ + NodeCount: 1, + InstancesPerNode: 1, + Resource: dbpreview.Resource{ + Storage: dbpreview.StorageConfiguration{PvcSize: "10Gi"}, + }, + }, + } + if version != "" { + db.Spec.DocumentDBVersion = version + } + if schemaVersion != "" { + db.Spec.SchemaVersion = schemaVersion + } + if image != "" { + db.Spec.DocumentDBImage = image + } + return db +} + +func TestValidateCreate_AllowsValidSpec(t *testing.T) { + v := &DocumentDBValidator{} + db := newTestDocumentDB("0.112.0", "", "") + warnings, err := v.ValidateCreate(context.Background(), db) + if err != nil { + t.Fatalf("expected no error, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestValidateCreate_AllowsAutoSchemaVersion(t *testing.T) { + v := &DocumentDBValidator{} + db := newTestDocumentDB("0.112.0", "auto", "") + warnings, err := v.ValidateCreate(context.Background(), db) + if err != nil { + t.Fatalf("expected no error, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestValidateCreate_AllowsSchemaVersionEqualToBinary(t *testing.T) { + v := &DocumentDBValidator{} + db := newTestDocumentDB("0.112.0", "0.112.0", "") + warnings, err := v.ValidateCreate(context.Background(), db) + if err != nil { + t.Fatalf("expected no error, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestValidateCreate_AllowsSchemaVersionBelowBinary(t *testing.T) { + v := &DocumentDBValidator{} + db := newTestDocumentDB("0.112.0", "0.110.0", "") + warnings, err := v.ValidateCreate(context.Background(), db) + if err != nil { + t.Fatalf("expected no error, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestValidateCreate_RejectsSchemaVersionAboveBinary(t *testing.T) { + v := &DocumentDBValidator{} + db := newTestDocumentDB("0.110.0", "0.112.0", "") + _, err := v.ValidateCreate(context.Background(), db) + if err == nil { + t.Fatal("expected error when schemaVersion > binaryVersion, got nil") + } + t.Logf("Got expected error: %v", err) +} + +func TestValidateCreate_AllowsSchemaVersionWithImageTag(t *testing.T) { + v := &DocumentDBValidator{} + db := newTestDocumentDB("", "0.112.0", "ghcr.io/documentdb/documentdb:0.112.0") + warnings, err := v.ValidateCreate(context.Background(), db) + if err != nil { + t.Fatalf("expected no error, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestValidateCreate_RejectsSchemaVersionAboveImageTag(t *testing.T) { + v := &DocumentDBValidator{} + db := newTestDocumentDB("", "0.115.0", "ghcr.io/documentdb/documentdb:0.112.0") + _, err := v.ValidateCreate(context.Background(), db) + if err == nil { + t.Fatal("expected error when schemaVersion > image tag version, got nil") + } + t.Logf("Got expected error: %v", err) +} + +func TestValidateCreate_AllowsNoBinaryVersion(t *testing.T) { + // When no binary version can be resolved, skip validation (operator defaults will apply) + v := &DocumentDBValidator{} + db := newTestDocumentDB("", "0.112.0", "") + warnings, err := v.ValidateCreate(context.Background(), db) + if err != nil { + t.Fatalf("expected no error when binary version unknown, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestValidateUpdate_AllowsValidUpgrade(t *testing.T) { + v := &DocumentDBValidator{} + oldDB := newTestDocumentDB("0.110.0", "", "") + oldDB.Status.SchemaVersion = "0.110.0" + newDB := newTestDocumentDB("0.112.0", "", "") + warnings, err := v.ValidateUpdate(context.Background(), oldDB, newDB) + if err != nil { + t.Fatalf("expected no error, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestValidateUpdate_BlocksImageRollbackBelowSchema(t *testing.T) { + v := &DocumentDBValidator{} + oldDB := newTestDocumentDB("0.112.0", "auto", "") + oldDB.Status.SchemaVersion = "0.112.0" + newDB := newTestDocumentDB("0.110.0", "auto", "") + _, err := v.ValidateUpdate(context.Background(), oldDB, newDB) + if err == nil { + t.Fatal("expected error when rolling back below schema version, got nil") + } + t.Logf("Got expected error: %v", err) +} + +func TestValidateUpdate_AllowsImageRollbackWhenNoSchema(t *testing.T) { + v := &DocumentDBValidator{} + oldDB := newTestDocumentDB("0.112.0", "", "") + // No schema version installed — rollback is safe + newDB := newTestDocumentDB("0.110.0", "", "") + warnings, err := v.ValidateUpdate(context.Background(), oldDB, newDB) + if err != nil { + t.Fatalf("expected no error when no schema installed, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestValidateUpdate_AllowsSameVersionOnUpdate(t *testing.T) { + v := &DocumentDBValidator{} + oldDB := newTestDocumentDB("0.112.0", "auto", "") + oldDB.Status.SchemaVersion = "0.112.0" + newDB := newTestDocumentDB("0.112.0", "auto", "") + warnings, err := v.ValidateUpdate(context.Background(), oldDB, newDB) + if err != nil { + t.Fatalf("expected no error for same version, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestValidateUpdate_BlocksSchemaVersionAboveBinary(t *testing.T) { + v := &DocumentDBValidator{} + oldDB := newTestDocumentDB("0.110.0", "", "") + oldDB.Status.SchemaVersion = "0.110.0" + newDB := newTestDocumentDB("0.110.0", "0.112.0", "") + _, err := v.ValidateUpdate(context.Background(), oldDB, newDB) + if err == nil { + t.Fatal("expected error when schemaVersion > binary on update, got nil") + } + t.Logf("Got expected error: %v", err) +} + +func TestValidateUpdate_ImageRollbackBlockedViaImageField(t *testing.T) { + v := &DocumentDBValidator{} + oldDB := newTestDocumentDB("", "", "ghcr.io/documentdb/documentdb:0.112.0") + oldDB.Status.SchemaVersion = "0.112.0" + newDB := newTestDocumentDB("", "", "ghcr.io/documentdb/documentdb:0.110.0") + _, err := v.ValidateUpdate(context.Background(), oldDB, newDB) + if err == nil { + t.Fatal("expected error when rolling back image below schema, got nil") + } + t.Logf("Got expected error: %v", err) +} + +func TestValidateDelete_AlwaysAllowed(t *testing.T) { + v := &DocumentDBValidator{} + db := newTestDocumentDB("0.112.0", "auto", "") + warnings, err := v.ValidateDelete(context.Background(), db) + if err != nil { + t.Fatalf("expected no error on delete, got %v", err) + } + if len(warnings) > 0 { + t.Fatalf("expected no warnings, got %v", warnings) + } +} + +func TestResolveBinaryVersion_PrefersImageTag(t *testing.T) { + db := newTestDocumentDB("0.110.0", "", "ghcr.io/documentdb/documentdb:0.112.0") + v := resolveBinaryVersion(db) + if v != "0.112.0" { + t.Fatalf("expected 0.112.0 from image tag, got %s", v) + } +} + +func TestResolveBinaryVersion_FallsBackToDocumentDBVersion(t *testing.T) { + db := newTestDocumentDB("0.110.0", "", "") + v := resolveBinaryVersion(db) + if v != "0.110.0" { + t.Fatalf("expected 0.110.0, got %s", v) + } +} + +func TestResolveBinaryVersion_ReturnsEmptyWhenNoneSet(t *testing.T) { + db := newTestDocumentDB("", "", "") + v := resolveBinaryVersion(db) + if v != "" { + t.Fatalf("expected empty, got %s", v) + } +} From 6932649bde2e1eca936c064ca8f7e37f1ccb7f34 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Thu, 26 Mar 2026 15:08:08 -0400 Subject: [PATCH 03/13] refactor: adopt CNPG validation function list pattern Restructure webhook to use two validation function slices following the CNPG ClusterCustomValidator pattern: - validate(db) spec-level rules run on both create and update Contains: validateSchemaVersionNotExceedsBinary - validateChanges(new, old) update-only rules comparing old vs new Contains: validateImageRollback This makes it easy to add new validation rules just append a function to the appropriate slice. Each validator has a consistent signature returning field.ErrorList, and errors from all validators are aggregated. Also adds var _ webhook.CustomValidator = &DocumentDBValidator{} compile check and uses apierrors.NewInvalid for proper Kubernetes error format. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu --- .../internal/webhook/documentdb_webhook.go | 136 +++++++++++------- 1 file changed, 83 insertions(+), 53 deletions(-) diff --git a/operator/src/internal/webhook/documentdb_webhook.go b/operator/src/internal/webhook/documentdb_webhook.go index 12218215..4b11261b 100644 --- a/operator/src/internal/webhook/documentdb_webhook.go +++ b/operator/src/internal/webhook/documentdb_webhook.go @@ -8,24 +8,29 @@ import ( "fmt" "strings" + apierrors "k8s.io/apimachinery/pkg/api/errors" "k8s.io/apimachinery/pkg/runtime" + "k8s.io/apimachinery/pkg/runtime/schema" "k8s.io/apimachinery/pkg/util/validation/field" ctrl "sigs.k8s.io/controller-runtime" "sigs.k8s.io/controller-runtime/pkg/client" logf "sigs.k8s.io/controller-runtime/pkg/log" + "sigs.k8s.io/controller-runtime/pkg/webhook" "sigs.k8s.io/controller-runtime/pkg/webhook/admission" dbpreview "github.com/documentdb/documentdb-operator/api/preview" util "github.com/documentdb/documentdb-operator/internal/utils" ) -var log = logf.Log.WithName("documentdb-webhook") +var documentdbLog = logf.Log.WithName("documentdb-webhook") // DocumentDBValidator validates DocumentDB resources on create and update. type DocumentDBValidator struct { client.Client } +var _ webhook.CustomValidator = &DocumentDBValidator{} + // SetupWebhookWithManager registers the validating webhook with the manager. func (v *DocumentDBValidator) SetupWebhookWithManager(mgr ctrl.Manager) error { v.Client = mgr.GetClient() @@ -38,18 +43,24 @@ func (v *DocumentDBValidator) SetupWebhookWithManager(mgr ctrl.Manager) error { // +kubebuilder:webhook:path=/validate-documentdb-io-preview-documentdb,mutating=false,failurePolicy=fail,sideEffects=None,groups=documentdb.io,resources=dbs,verbs=create;update,versions=preview,name=vdocumentdb.kb.io,admissionReviewVersions=v1 // ValidateCreate validates a DocumentDB resource on creation. -func (v *DocumentDBValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) { +func (v *DocumentDBValidator) ValidateCreate(_ context.Context, obj runtime.Object) (admission.Warnings, error) { documentdb, ok := obj.(*dbpreview.DocumentDB) if !ok { return nil, fmt.Errorf("expected DocumentDB but got %T", obj) } - log.Info("validating create", "name", documentdb.Name) + documentdbLog.Info("Validation for DocumentDB upon creation", "name", documentdb.Name, "namespace", documentdb.Namespace) - return validateSpec(documentdb) + allErrs := v.validate(documentdb) + if len(allErrs) == 0 { + return nil, nil + } + return nil, apierrors.NewInvalid( + schema.GroupKind{Group: "documentdb.io", Kind: "DocumentDB"}, + documentdb.Name, allErrs) } // ValidateUpdate validates a DocumentDB resource on update. -func (v *DocumentDBValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) { +func (v *DocumentDBValidator) ValidateUpdate(_ context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) { newDB, ok := newObj.(*dbpreview.DocumentDB) if !ok { return nil, fmt.Errorf("expected DocumentDB but got %T", newObj) @@ -58,52 +69,54 @@ func (v *DocumentDBValidator) ValidateUpdate(ctx context.Context, oldObj, newObj if !ok { return nil, fmt.Errorf("expected DocumentDB but got %T", oldObj) } - log.Info("validating update", "name", newDB.Name) - - warnings, err := validateSpec(newDB) - if err != nil { - return warnings, err - } - - // Validate image rollback is not below installed schema version. - // This is the primary enforcement point — blocks the spec change before it is persisted. - if rollbackErr := validateImageRollback(oldDB, newDB); rollbackErr != nil { - return warnings, rollbackErr + documentdbLog.Info("Validation for DocumentDB upon update", "name", newDB.Name, "namespace", newDB.Namespace) + + allErrs := append( + v.validate(newDB), + v.validateChanges(newDB, oldDB)..., + ) + if len(allErrs) == 0 { + return nil, nil } - - return warnings, nil + return nil, apierrors.NewInvalid( + schema.GroupKind{Group: "documentdb.io", Kind: "DocumentDB"}, + newDB.Name, allErrs) } // ValidateDelete is a no-op for DocumentDB. -func (v *DocumentDBValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) { +func (v *DocumentDBValidator) ValidateDelete(_ context.Context, _ runtime.Object) (admission.Warnings, error) { return nil, nil } -// validateSpec checks spec-level invariants that apply to both create and update. -func validateSpec(db *dbpreview.DocumentDB) (admission.Warnings, error) { - var allErrs field.ErrorList - specPath := field.NewPath("spec") - - // If schemaVersion is set to an explicit version (not empty, not "auto"), - // it must be <= the resolved binary version. - if db.Spec.SchemaVersion != "" && db.Spec.SchemaVersion != "auto" { - binaryVersion := resolveBinaryVersion(db) - if binaryVersion != "" { - if err := validateSchemaNotExceedsBinary(db.Spec.SchemaVersion, binaryVersion, specPath); err != nil { - allErrs = append(allErrs, err) - } - } - } +// --------------------------------------------------------------------------- +// Spec-level validations (run on both create and update) +// --------------------------------------------------------------------------- - if len(allErrs) > 0 { - return nil, allErrs.ToAggregate() +// validate runs all spec-level validation rules, returning a combined error list. +func (v *DocumentDBValidator) validate(db *dbpreview.DocumentDB) (allErrs field.ErrorList) { + type validationFunc func(*dbpreview.DocumentDB) field.ErrorList + validations := []validationFunc{ + v.validateSchemaVersionNotExceedsBinary, + // Add new spec-level validations here. } - return nil, nil + for _, fn := range validations { + allErrs = append(allErrs, fn(db)...) + } + return allErrs } -// validateSchemaNotExceedsBinary ensures schemaVersion <= binaryVersion. -func validateSchemaNotExceedsBinary(schemaVersion, binaryVersion string, specPath *field.Path) *field.Error { - schemaPg := util.SemverToExtensionVersion(schemaVersion) +// validateSchemaVersionNotExceedsBinary ensures spec.schemaVersion <= binary version. +func (v *DocumentDBValidator) validateSchemaVersionNotExceedsBinary(db *dbpreview.DocumentDB) field.ErrorList { + if db.Spec.SchemaVersion == "" || db.Spec.SchemaVersion == "auto" { + return nil + } + + binaryVersion := resolveBinaryVersion(db) + if binaryVersion == "" { + return nil + } + + schemaPg := util.SemverToExtensionVersion(db.Spec.SchemaVersion) binaryPg := util.SemverToExtensionVersion(binaryVersion) cmp, err := util.CompareExtensionVersions(schemaPg, binaryPg) @@ -112,61 +125,78 @@ func validateSchemaNotExceedsBinary(schemaVersion, binaryVersion string, specPat return nil } if cmp > 0 { - return field.Invalid( - specPath.Child("schemaVersion"), - schemaVersion, - fmt.Sprintf("schemaVersion %s exceeds the binary version %s; schema version must be <= binary version", schemaVersion, binaryVersion), - ) + return field.ErrorList{field.Invalid( + field.NewPath("spec", "schemaVersion"), + db.Spec.SchemaVersion, + fmt.Sprintf("schemaVersion %s exceeds the binary version %s; schema version must be <= binary version", + db.Spec.SchemaVersion, binaryVersion), + )} } return nil } +// --------------------------------------------------------------------------- +// Update-only validations (compare old and new) +// --------------------------------------------------------------------------- + +// validateChanges runs all update-specific validation rules that compare old vs new state. +func (v *DocumentDBValidator) validateChanges(newDB, oldDB *dbpreview.DocumentDB) (allErrs field.ErrorList) { + type validationFunc func(newDB, oldDB *dbpreview.DocumentDB) field.ErrorList + validations := []validationFunc{ + v.validateImageRollback, + // Add new update-only validations here. + } + for _, fn := range validations { + allErrs = append(allErrs, fn(newDB, oldDB)...) + } + return allErrs +} + // validateImageRollback blocks image downgrades below the installed schema version. // Once ALTER EXTENSION UPDATE has run, the schema is irreversible. Running an older // binary against a newer schema is untested and may cause data corruption. -func validateImageRollback(oldDB, newDB *dbpreview.DocumentDB) error { +func (v *DocumentDBValidator) validateImageRollback(newDB, oldDB *dbpreview.DocumentDB) field.ErrorList { installedSchemaVersion := oldDB.Status.SchemaVersion if installedSchemaVersion == "" { return nil } - // Determine the new binary version from the updated spec newBinaryVersion := resolveBinaryVersion(newDB) if newBinaryVersion == "" { return nil } - // Compare: newBinaryVersion must be >= installedSchemaVersion newPg := util.SemverToExtensionVersion(newBinaryVersion) schemaPg := util.SemverToExtensionVersion(installedSchemaVersion) cmp, err := util.CompareExtensionVersions(newPg, schemaPg) if err != nil { - // Can't parse — let it through, controller has defense-in-depth return nil } if cmp < 0 { - return field.Forbidden( + return field.ErrorList{field.Forbidden( field.NewPath("spec"), fmt.Sprintf( "image rollback blocked: requested version %s is older than installed schema version %s. "+ "ALTER EXTENSION has no downgrade path — running an older binary with a newer schema may cause data corruption. "+ "To recover, restore from backup or update to a version >= %s.", newBinaryVersion, installedSchemaVersion, installedSchemaVersion), - ) + )} } return nil } +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + // resolveBinaryVersion extracts the effective binary version from a DocumentDB spec. // Priority: documentDBImage tag > documentDBVersion > "" (unknown). func resolveBinaryVersion(db *dbpreview.DocumentDB) string { - // If explicit image is set, extract tag if db.Spec.DocumentDBImage != "" { if tagIdx := strings.LastIndex(db.Spec.DocumentDBImage, ":"); tagIdx >= 0 { return db.Spec.DocumentDBImage[tagIdx+1:] } } - // Fall back to documentDBVersion return db.Spec.DocumentDBVersion } From d369756050f952be42d1b210d51942b1e4202ac7 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Thu, 26 Mar 2026 15:38:56 -0400 Subject: [PATCH 04/13] refactor: simplify controller by moving validations to webhook - Remove Step 1b (image rollback blocking) from controller webhook handles it - Simplify determineSchemaTarget: keep lightweight defense-in-depth guard - Rename Pg-suffixed variables to full names (schemaExtensionVersion, etc.) - Refactor webhook tests to Ginkgo/Gomega (matching CNPG and controller patterns) - Add suite_test.go for webhook package Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../controller/documentdb_controller.go | 57 +-- .../controller/documentdb_controller_test.go | 84 +--- .../internal/webhook/documentdb_webhook.go | 12 +- .../webhook/documentdb_webhook_test.go | 399 +++++++++--------- operator/src/internal/webhook/suite_test.go | 16 + 5 files changed, 229 insertions(+), 339 deletions(-) create mode 100644 operator/src/internal/webhook/suite_test.go diff --git a/operator/src/internal/controller/documentdb_controller.go b/operator/src/internal/controller/documentdb_controller.go index 71d8f081..7b1bbe3c 100644 --- a/operator/src/internal/controller/documentdb_controller.go +++ b/operator/src/internal/controller/documentdb_controller.go @@ -853,41 +853,8 @@ func (r *DocumentDBReconciler) upgradeDocumentDBIfNeeded(ctx context.Context, cu return fmt.Errorf("failed to build image patch operations: %w", err) } - // Step 1b: Block extension image rollback below installed schema version. - // Once ALTER EXTENSION UPDATE has run, the schema is irreversible. Running an older - // binary against a newer schema is untested and may cause data corruption. - if extensionUpdated && documentdb.Status.SchemaVersion != "" { - // Extract the version tag from the desired extension image reference - desiredExtImage := "" - for _, ext := range desiredCluster.Spec.PostgresConfiguration.Extensions { - if ext.Name == "documentdb" { - desiredExtImage = ext.ImageVolumeSource.Reference - break - } - } - if desiredExtImage != "" { - // Extract tag from image reference (e.g., "ghcr.io/.../documentdb:0.112.0" → "0.112.0") - if tagIdx := strings.LastIndex(desiredExtImage, ":"); tagIdx >= 0 { - newTag := desiredExtImage[tagIdx+1:] - // Convert both to extension format for comparison - newExtVersion := util.SemverToExtensionVersion(newTag) - schemaExtVersion := util.SemverToExtensionVersion(documentdb.Status.SchemaVersion) - cmp, err := util.CompareExtensionVersions(newExtVersion, schemaExtVersion) - if err == nil && cmp < 0 { - msg := fmt.Sprintf( - "Image rollback blocked: requested image version %s is older than installed schema version %s. "+ - "ALTER EXTENSION has no downgrade path — running an older binary with a newer schema may cause data corruption. "+ - "To recover, restore from backup or update to a version >= %s.", - newTag, documentdb.Status.SchemaVersion, documentdb.Status.SchemaVersion) - logger.Info(msg) - if r.Recorder != nil { - r.Recorder.Event(documentdb, corev1.EventTypeWarning, "ImageRollbackBlocked", msg) - } - return nil - } - } - } - } + // Note: Image rollback below installed schema version is blocked by the validating + // webhook at admission time. The controller trusts that persisted specs are valid. // Step 2: Apply patch if any images need updating if len(patchOps) > 0 { @@ -1092,11 +1059,12 @@ func (r *DocumentDBReconciler) determineSchemaTarget( return binaryVersion, "ALTER EXTENSION documentdb UPDATE" default: - // Explicit version: validate and update to that specific version - // Convert semver ("0.112.0") to pg extension format ("0.112-0") for comparison + // Explicit version: update to that specific version. + // Note: schemaVersion <= binary version is enforced by the validating webhook at + // admission time. This is a defense-in-depth guard. targetPgVersion := util.SemverToExtensionVersion(specSchemaVersion) - // Validate: target must not exceed binary version + // Guard: target must not exceed binary version targetCmp, err := util.CompareExtensionVersions(targetPgVersion, binaryVersion) if err != nil { logger.Error(err, "Failed to compare target schema version with binary version", @@ -1105,18 +1073,13 @@ func (r *DocumentDBReconciler) determineSchemaTarget( return "", "" } if targetCmp > 0 { - msg := fmt.Sprintf( - "Requested schema version %s exceeds binary version %s. "+ - "Schema version must be <= binary version. Skipping schema update.", - specSchemaVersion, util.ExtensionVersionToSemver(binaryVersion)) - logger.Info(msg) - if r.Recorder != nil { - r.Recorder.Event(documentdb, corev1.EventTypeWarning, "SchemaVersionExceedsBinary", msg) - } + logger.Info("Skipping schema update: schemaVersion exceeds binary version (should have been rejected by webhook)", + "schemaVersion", specSchemaVersion, + "binaryVersion", util.ExtensionVersionToSemver(binaryVersion)) return "", "" } - // Validate: target must be > installed version (otherwise no-op) + // Check: target must be > installed version (otherwise no-op) installedCmp, err := util.CompareExtensionVersions(targetPgVersion, installedVersion) if err != nil { logger.Error(err, "Failed to compare target schema version with installed version", diff --git a/operator/src/internal/controller/documentdb_controller_test.go b/operator/src/internal/controller/documentdb_controller_test.go index 3f60fddf..cba94470 100644 --- a/operator/src/internal/controller/documentdb_controller_test.go +++ b/operator/src/internal/controller/documentdb_controller_test.go @@ -2505,7 +2505,10 @@ var _ = Describe("DocumentDB Controller", func() { Expect(updatedDB.Status.SchemaVersion).To(Equal("0.110.0")) }) - It("should emit warning and skip when explicit schemaVersion exceeds binary version", func() { + It("should skip when explicit schemaVersion exceeds binary version", func() { + // Note: In production, the validating webhook rejects schemaVersion > binary + // at admission time. This test verifies the controller gracefully handles the + // case as a no-op (defense-in-depth). cluster := &cnpgv1.Cluster{ ObjectMeta: metav1.ObjectMeta{ Name: clusterName, @@ -2638,83 +2641,8 @@ var _ = Describe("DocumentDB Controller", func() { Expect(sqlCalls).To(HaveLen(1)) }) - It("should block image rollback when new image version is below installed schema version", func() { - // Current cluster has old image - cluster := &cnpgv1.Cluster{ - ObjectMeta: metav1.ObjectMeta{ - Name: clusterName, - Namespace: clusterNamespace, - }, - Spec: cnpgv1.ClusterSpec{ - PostgresConfiguration: cnpgv1.PostgresConfiguration{ - Extensions: []cnpgv1.ExtensionConfiguration{ - { - Name: "documentdb", - ImageVolumeSource: corev1.ImageVolumeSource{ - Reference: "ghcr.io/documentdb/documentdb:0.110.0", - }, - }, - }, - }, - }, - Status: cnpgv1.ClusterStatus{ - CurrentPrimary: "test-cluster-1", - InstancesStatus: map[cnpgv1.PodStatus][]string{ - cnpgv1.PodHealthy: {"test-cluster-1"}, - }, - }, - } - - // Desired cluster has OLDER image (rollback attempt) - desiredCluster := cluster.DeepCopy() - desiredCluster.Spec.PostgresConfiguration.Extensions[0].ImageVolumeSource.Reference = "ghcr.io/documentdb/documentdb:0.109.0" - - // Schema is already at 0.110.0 (ALTER EXTENSION was previously applied) - documentdb := &dbpreview.DocumentDB{ - ObjectMeta: metav1.ObjectMeta{ - Name: "test-documentdb", - Namespace: clusterNamespace, - }, - Status: dbpreview.DocumentDBStatus{ - SchemaVersion: "0.110.0", - }, - } - - fakeClient := fake.NewClientBuilder(). - WithScheme(scheme). - WithObjects(cluster, documentdb). - WithStatusSubresource(&dbpreview.DocumentDB{}). - Build() - - sqlCallCount := 0 - reconciler := &DocumentDBReconciler{ - Client: fakeClient, - Scheme: scheme, - Recorder: recorder, - SQLExecutor: func(_ context.Context, _ *cnpgv1.Cluster, sql string) (string, error) { - sqlCallCount++ - return "", nil - }, - } - - err := reconciler.upgradeDocumentDBIfNeeded(ctx, cluster, desiredCluster, documentdb) - Expect(err).ToNot(HaveOccurred()) - - // No SQL should have been called — image patch was blocked - Expect(sqlCallCount).To(Equal(0)) - - // Verify the CNPG cluster was NOT patched (extension image unchanged) - updatedCluster := &cnpgv1.Cluster{} - Expect(fakeClient.Get(ctx, types.NamespacedName{Name: clusterName, Namespace: clusterNamespace}, updatedCluster)).To(Succeed()) - Expect(updatedCluster.Spec.PostgresConfiguration.Extensions[0].ImageVolumeSource.Reference).To(Equal("ghcr.io/documentdb/documentdb:0.110.0")) - - // Verify ImageRollbackBlocked warning event was emitted - Expect(recorder.Events).To(HaveLen(1)) - event := <-recorder.Events - Expect(event).To(ContainSubstring("ImageRollbackBlocked")) - Expect(event).To(ContainSubstring("0.109.0")) - Expect(event).To(ContainSubstring("0.110.0")) - }) + // Note: Image rollback blocking is now enforced by the validating webhook at + // admission time. The controller no longer needs to check for this case. }) Describe("updateImageStatus", func() { diff --git a/operator/src/internal/webhook/documentdb_webhook.go b/operator/src/internal/webhook/documentdb_webhook.go index 4b11261b..d0f0343f 100644 --- a/operator/src/internal/webhook/documentdb_webhook.go +++ b/operator/src/internal/webhook/documentdb_webhook.go @@ -116,10 +116,10 @@ func (v *DocumentDBValidator) validateSchemaVersionNotExceedsBinary(db *dbprevie return nil } - schemaPg := util.SemverToExtensionVersion(db.Spec.SchemaVersion) - binaryPg := util.SemverToExtensionVersion(binaryVersion) + schemaExtensionVersion := util.SemverToExtensionVersion(db.Spec.SchemaVersion) + binaryExtensionVersion := util.SemverToExtensionVersion(binaryVersion) - cmp, err := util.CompareExtensionVersions(schemaPg, binaryPg) + cmp, err := util.CompareExtensionVersions(schemaExtensionVersion, binaryExtensionVersion) if err != nil { // If we can't parse versions, let it through — controller will catch it return nil @@ -166,10 +166,10 @@ func (v *DocumentDBValidator) validateImageRollback(newDB, oldDB *dbpreview.Docu return nil } - newPg := util.SemverToExtensionVersion(newBinaryVersion) - schemaPg := util.SemverToExtensionVersion(installedSchemaVersion) + newBinaryExtensionVersion := util.SemverToExtensionVersion(newBinaryVersion) + schemaExtensionVersion := util.SemverToExtensionVersion(installedSchemaVersion) - cmp, err := util.CompareExtensionVersions(newPg, schemaPg) + cmp, err := util.CompareExtensionVersions(newBinaryExtensionVersion, schemaExtensionVersion) if err != nil { return nil } diff --git a/operator/src/internal/webhook/documentdb_webhook_test.go b/operator/src/internal/webhook/documentdb_webhook_test.go index f7535b1a..bda9dc6a 100644 --- a/operator/src/internal/webhook/documentdb_webhook_test.go +++ b/operator/src/internal/webhook/documentdb_webhook_test.go @@ -5,10 +5,12 @@ package webhook import ( "context" - "testing" - dbpreview "github.com/documentdb/documentdb-operator/api/preview" + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + + dbpreview "github.com/documentdb/documentdb-operator/api/preview" ) func newTestDocumentDB(version, schemaVersion, image string) *dbpreview.DocumentDB { @@ -37,209 +39,190 @@ func newTestDocumentDB(version, schemaVersion, image string) *dbpreview.Document return db } -func TestValidateCreate_AllowsValidSpec(t *testing.T) { - v := &DocumentDBValidator{} - db := newTestDocumentDB("0.112.0", "", "") - warnings, err := v.ValidateCreate(context.Background(), db) - if err != nil { - t.Fatalf("expected no error, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestValidateCreate_AllowsAutoSchemaVersion(t *testing.T) { - v := &DocumentDBValidator{} - db := newTestDocumentDB("0.112.0", "auto", "") - warnings, err := v.ValidateCreate(context.Background(), db) - if err != nil { - t.Fatalf("expected no error, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestValidateCreate_AllowsSchemaVersionEqualToBinary(t *testing.T) { - v := &DocumentDBValidator{} - db := newTestDocumentDB("0.112.0", "0.112.0", "") - warnings, err := v.ValidateCreate(context.Background(), db) - if err != nil { - t.Fatalf("expected no error, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestValidateCreate_AllowsSchemaVersionBelowBinary(t *testing.T) { - v := &DocumentDBValidator{} - db := newTestDocumentDB("0.112.0", "0.110.0", "") - warnings, err := v.ValidateCreate(context.Background(), db) - if err != nil { - t.Fatalf("expected no error, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestValidateCreate_RejectsSchemaVersionAboveBinary(t *testing.T) { - v := &DocumentDBValidator{} - db := newTestDocumentDB("0.110.0", "0.112.0", "") - _, err := v.ValidateCreate(context.Background(), db) - if err == nil { - t.Fatal("expected error when schemaVersion > binaryVersion, got nil") - } - t.Logf("Got expected error: %v", err) -} - -func TestValidateCreate_AllowsSchemaVersionWithImageTag(t *testing.T) { - v := &DocumentDBValidator{} - db := newTestDocumentDB("", "0.112.0", "ghcr.io/documentdb/documentdb:0.112.0") - warnings, err := v.ValidateCreate(context.Background(), db) - if err != nil { - t.Fatalf("expected no error, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestValidateCreate_RejectsSchemaVersionAboveImageTag(t *testing.T) { - v := &DocumentDBValidator{} - db := newTestDocumentDB("", "0.115.0", "ghcr.io/documentdb/documentdb:0.112.0") - _, err := v.ValidateCreate(context.Background(), db) - if err == nil { - t.Fatal("expected error when schemaVersion > image tag version, got nil") - } - t.Logf("Got expected error: %v", err) -} - -func TestValidateCreate_AllowsNoBinaryVersion(t *testing.T) { - // When no binary version can be resolved, skip validation (operator defaults will apply) - v := &DocumentDBValidator{} - db := newTestDocumentDB("", "0.112.0", "") - warnings, err := v.ValidateCreate(context.Background(), db) - if err != nil { - t.Fatalf("expected no error when binary version unknown, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestValidateUpdate_AllowsValidUpgrade(t *testing.T) { - v := &DocumentDBValidator{} - oldDB := newTestDocumentDB("0.110.0", "", "") - oldDB.Status.SchemaVersion = "0.110.0" - newDB := newTestDocumentDB("0.112.0", "", "") - warnings, err := v.ValidateUpdate(context.Background(), oldDB, newDB) - if err != nil { - t.Fatalf("expected no error, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestValidateUpdate_BlocksImageRollbackBelowSchema(t *testing.T) { - v := &DocumentDBValidator{} - oldDB := newTestDocumentDB("0.112.0", "auto", "") - oldDB.Status.SchemaVersion = "0.112.0" - newDB := newTestDocumentDB("0.110.0", "auto", "") - _, err := v.ValidateUpdate(context.Background(), oldDB, newDB) - if err == nil { - t.Fatal("expected error when rolling back below schema version, got nil") - } - t.Logf("Got expected error: %v", err) -} - -func TestValidateUpdate_AllowsImageRollbackWhenNoSchema(t *testing.T) { - v := &DocumentDBValidator{} - oldDB := newTestDocumentDB("0.112.0", "", "") - // No schema version installed — rollback is safe - newDB := newTestDocumentDB("0.110.0", "", "") - warnings, err := v.ValidateUpdate(context.Background(), oldDB, newDB) - if err != nil { - t.Fatalf("expected no error when no schema installed, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestValidateUpdate_AllowsSameVersionOnUpdate(t *testing.T) { - v := &DocumentDBValidator{} - oldDB := newTestDocumentDB("0.112.0", "auto", "") - oldDB.Status.SchemaVersion = "0.112.0" - newDB := newTestDocumentDB("0.112.0", "auto", "") - warnings, err := v.ValidateUpdate(context.Background(), oldDB, newDB) - if err != nil { - t.Fatalf("expected no error for same version, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestValidateUpdate_BlocksSchemaVersionAboveBinary(t *testing.T) { - v := &DocumentDBValidator{} - oldDB := newTestDocumentDB("0.110.0", "", "") - oldDB.Status.SchemaVersion = "0.110.0" - newDB := newTestDocumentDB("0.110.0", "0.112.0", "") - _, err := v.ValidateUpdate(context.Background(), oldDB, newDB) - if err == nil { - t.Fatal("expected error when schemaVersion > binary on update, got nil") - } - t.Logf("Got expected error: %v", err) -} - -func TestValidateUpdate_ImageRollbackBlockedViaImageField(t *testing.T) { - v := &DocumentDBValidator{} - oldDB := newTestDocumentDB("", "", "ghcr.io/documentdb/documentdb:0.112.0") - oldDB.Status.SchemaVersion = "0.112.0" - newDB := newTestDocumentDB("", "", "ghcr.io/documentdb/documentdb:0.110.0") - _, err := v.ValidateUpdate(context.Background(), oldDB, newDB) - if err == nil { - t.Fatal("expected error when rolling back image below schema, got nil") - } - t.Logf("Got expected error: %v", err) -} - -func TestValidateDelete_AlwaysAllowed(t *testing.T) { - v := &DocumentDBValidator{} - db := newTestDocumentDB("0.112.0", "auto", "") - warnings, err := v.ValidateDelete(context.Background(), db) - if err != nil { - t.Fatalf("expected no error on delete, got %v", err) - } - if len(warnings) > 0 { - t.Fatalf("expected no warnings, got %v", warnings) - } -} - -func TestResolveBinaryVersion_PrefersImageTag(t *testing.T) { - db := newTestDocumentDB("0.110.0", "", "ghcr.io/documentdb/documentdb:0.112.0") - v := resolveBinaryVersion(db) - if v != "0.112.0" { - t.Fatalf("expected 0.112.0 from image tag, got %s", v) - } -} - -func TestResolveBinaryVersion_FallsBackToDocumentDBVersion(t *testing.T) { - db := newTestDocumentDB("0.110.0", "", "") - v := resolveBinaryVersion(db) - if v != "0.110.0" { - t.Fatalf("expected 0.110.0, got %s", v) - } -} - -func TestResolveBinaryVersion_ReturnsEmptyWhenNoneSet(t *testing.T) { - db := newTestDocumentDB("", "", "") - v := resolveBinaryVersion(db) - if v != "" { - t.Fatalf("expected empty, got %s", v) - } -} +var _ = Describe("schema version validation", func() { + var v *DocumentDBValidator + + BeforeEach(func() { + v = &DocumentDBValidator{} + }) + + It("allows an empty schemaVersion", func() { + db := newTestDocumentDB("0.112.0", "", "") + result := v.validateSchemaVersionNotExceedsBinary(db) + Expect(result).To(BeEmpty()) + }) + + It("allows schemaVersion set to auto", func() { + db := newTestDocumentDB("0.112.0", "auto", "") + result := v.validateSchemaVersionNotExceedsBinary(db) + Expect(result).To(BeEmpty()) + }) + + It("allows schemaVersion equal to binary version", func() { + db := newTestDocumentDB("0.112.0", "0.112.0", "") + result := v.validateSchemaVersionNotExceedsBinary(db) + Expect(result).To(BeEmpty()) + }) + + It("allows schemaVersion below binary version", func() { + db := newTestDocumentDB("0.112.0", "0.110.0", "") + result := v.validateSchemaVersionNotExceedsBinary(db) + Expect(result).To(BeEmpty()) + }) + + It("rejects schemaVersion above binary version", func() { + db := newTestDocumentDB("0.110.0", "0.112.0", "") + result := v.validateSchemaVersionNotExceedsBinary(db) + Expect(result).To(HaveLen(1)) + Expect(result[0].Detail).To(ContainSubstring("exceeds the binary version")) + }) + + It("allows schemaVersion equal to image tag version", func() { + db := newTestDocumentDB("", "0.112.0", "ghcr.io/documentdb/documentdb:0.112.0") + result := v.validateSchemaVersionNotExceedsBinary(db) + Expect(result).To(BeEmpty()) + }) + + It("rejects schemaVersion above image tag version", func() { + db := newTestDocumentDB("", "0.115.0", "ghcr.io/documentdb/documentdb:0.112.0") + result := v.validateSchemaVersionNotExceedsBinary(db) + Expect(result).To(HaveLen(1)) + Expect(result[0].Detail).To(ContainSubstring("exceeds the binary version")) + }) + + It("skips validation when no binary version can be resolved", func() { + db := newTestDocumentDB("", "0.112.0", "") + result := v.validateSchemaVersionNotExceedsBinary(db) + Expect(result).To(BeEmpty()) + }) +}) + +var _ = Describe("image rollback validation", func() { + var v *DocumentDBValidator + + BeforeEach(func() { + v = &DocumentDBValidator{} + }) + + It("allows upgrade above installed schema version", func() { + oldDB := newTestDocumentDB("0.110.0", "", "") + oldDB.Status.SchemaVersion = "0.110.0" + newDB := newTestDocumentDB("0.112.0", "", "") + result := v.validateImageRollback(newDB, oldDB) + Expect(result).To(BeEmpty()) + }) + + It("blocks image rollback below installed schema version", func() { + oldDB := newTestDocumentDB("0.112.0", "auto", "") + oldDB.Status.SchemaVersion = "0.112.0" + newDB := newTestDocumentDB("0.110.0", "auto", "") + result := v.validateImageRollback(newDB, oldDB) + Expect(result).To(HaveLen(1)) + Expect(result[0].Detail).To(ContainSubstring("image rollback blocked")) + }) + + It("allows rollback when no schema version is installed", func() { + oldDB := newTestDocumentDB("0.112.0", "", "") + newDB := newTestDocumentDB("0.110.0", "", "") + result := v.validateImageRollback(newDB, oldDB) + Expect(result).To(BeEmpty()) + }) + + It("allows same version on update", func() { + oldDB := newTestDocumentDB("0.112.0", "auto", "") + oldDB.Status.SchemaVersion = "0.112.0" + newDB := newTestDocumentDB("0.112.0", "auto", "") + result := v.validateImageRollback(newDB, oldDB) + Expect(result).To(BeEmpty()) + }) + + It("blocks image rollback via documentDBImage field", func() { + oldDB := newTestDocumentDB("", "", "ghcr.io/documentdb/documentdb:0.112.0") + oldDB.Status.SchemaVersion = "0.112.0" + newDB := newTestDocumentDB("", "", "ghcr.io/documentdb/documentdb:0.110.0") + result := v.validateImageRollback(newDB, oldDB) + Expect(result).To(HaveLen(1)) + Expect(result[0].Detail).To(ContainSubstring("image rollback blocked")) + }) +}) + +var _ = Describe("ValidateCreate admission handler", func() { + var v *DocumentDBValidator + + BeforeEach(func() { + v = &DocumentDBValidator{} + }) + + It("allows a valid DocumentDB resource", func() { + db := newTestDocumentDB("0.112.0", "", "") + warnings, err := v.ValidateCreate(context.Background(), db) + Expect(err).ToNot(HaveOccurred()) + Expect(warnings).To(BeEmpty()) + }) + + It("rejects a resource with schemaVersion above binary", func() { + db := newTestDocumentDB("0.110.0", "0.112.0", "") + _, err := v.ValidateCreate(context.Background(), db) + Expect(err).To(HaveOccurred()) + }) +}) + +var _ = Describe("ValidateUpdate admission handler", func() { + var v *DocumentDBValidator + + BeforeEach(func() { + v = &DocumentDBValidator{} + }) + + It("allows a valid upgrade", func() { + oldDB := newTestDocumentDB("0.110.0", "", "") + oldDB.Status.SchemaVersion = "0.110.0" + newDB := newTestDocumentDB("0.112.0", "", "") + warnings, err := v.ValidateUpdate(context.Background(), oldDB, newDB) + Expect(err).ToNot(HaveOccurred()) + Expect(warnings).To(BeEmpty()) + }) + + It("rejects rollback below installed schema version", func() { + oldDB := newTestDocumentDB("0.112.0", "auto", "") + oldDB.Status.SchemaVersion = "0.112.0" + newDB := newTestDocumentDB("0.110.0", "auto", "") + _, err := v.ValidateUpdate(context.Background(), oldDB, newDB) + Expect(err).To(HaveOccurred()) + }) + + It("rejects schemaVersion above binary on update", func() { + oldDB := newTestDocumentDB("0.110.0", "", "") + oldDB.Status.SchemaVersion = "0.110.0" + newDB := newTestDocumentDB("0.110.0", "0.112.0", "") + _, err := v.ValidateUpdate(context.Background(), oldDB, newDB) + Expect(err).To(HaveOccurred()) + }) +}) + +var _ = Describe("ValidateDelete admission handler", func() { + It("always allows deletion", func() { + v := &DocumentDBValidator{} + db := newTestDocumentDB("0.112.0", "auto", "") + warnings, err := v.ValidateDelete(context.Background(), db) + Expect(err).ToNot(HaveOccurred()) + Expect(warnings).To(BeEmpty()) + }) +}) + +var _ = Describe("resolveBinaryVersion helper", func() { + It("prefers the image tag over documentDBVersion", func() { + db := newTestDocumentDB("0.110.0", "", "ghcr.io/documentdb/documentdb:0.112.0") + Expect(resolveBinaryVersion(db)).To(Equal("0.112.0")) + }) + + It("falls back to documentDBVersion when no image is set", func() { + db := newTestDocumentDB("0.110.0", "", "") + Expect(resolveBinaryVersion(db)).To(Equal("0.110.0")) + }) + + It("returns empty when neither image nor version is set", func() { + db := newTestDocumentDB("", "", "") + Expect(resolveBinaryVersion(db)).To(BeEmpty()) + }) +}) diff --git a/operator/src/internal/webhook/suite_test.go b/operator/src/internal/webhook/suite_test.go new file mode 100644 index 00000000..b2c64e96 --- /dev/null +++ b/operator/src/internal/webhook/suite_test.go @@ -0,0 +1,16 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +package webhook + +import ( + "testing" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +func TestWebhook(t *testing.T) { + RegisterFailHandler(Fail) + RunSpecs(t, "Webhook Suite") +} From fe29184de0eb7a0286f028886818f6e6466d4bad Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Thu, 26 Mar 2026 21:00:19 -0400 Subject: [PATCH 05/13] docs: fix release tag format in CRD upgrade command (remove v prefix) Signed-off-by: Wenting Wu --- .../preview/operations/upgrades.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index e8a8e4b9..f0fe5491 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -47,8 +47,8 @@ helm search repo documentdb/documentdb-operator --versions Helm only installs CRDs on initial `helm install` — it does **not** update them on `helm upgrade`. If the new operator version introduces CRD schema changes, you must apply them manually first: ```bash -# Set this to the release tag you are upgrading to (e.g., v0.2.0) -TARGET_VERSION=v0.2.0 +# Set this to the release tag you are upgrading to (e.g., 0.2.0) +TARGET_VERSION=0.2.0 kubectl apply --server-side --force-conflicts \ -f https://raw.githubusercontent.com/documentdb/documentdb-kubernetes-operator/${TARGET_VERSION}/operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml \ From f19bfdb575ac72c60f709c785dfff6190ac72a8f Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Thu, 26 Mar 2026 21:56:22 -0400 Subject: [PATCH 06/13] docs: improve upgrade documentation with control/data plane framing - Reframe upgrade types as Operator (control plane) vs DocumentDB (data plane) - Explain documentDBVersion vs schemaVersion relationship clearly - Reorganize data plane upgrade as step-by-step walkthrough with production/dev tabs - Add Multi-Region Upgrades section with coordination guidance - Move Advanced Image Overrides to its own section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../preview/operations/upgrades.md | 131 +++++++++++------- 1 file changed, 80 insertions(+), 51 deletions(-) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index f0fe5491..6bc58fd4 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -1,6 +1,6 @@ --- title: Upgrades -description: Upgrade the DocumentDB operator Helm chart and CRDs, update DocumentDB extension and gateway images per cluster, and roll back to a previous version. +description: Upgrade the DocumentDB operator (control plane) and DocumentDB clusters (data plane), including version and schema management. tags: - operations - upgrades @@ -13,19 +13,21 @@ tags: Upgrades keep your DocumentDB deployment current with the latest features, security patches, and bug fixes. -There are two types of upgrades in a DocumentDB deployment: +A DocumentDB deployment has two independently upgradable layers: -| Upgrade Type | What Changes | How to Trigger | -|-------------|-------------|----------------| -| **DocumentDB operator** | The Kubernetes operator + bundled CloudNative-PG | Helm chart upgrade | -| **DocumentDB components** | Extension + gateway (same version) | Update `spec.documentDBVersion` | +| Layer | What It Is | What Changes | How to Trigger | +|-------|-----------|-------------|----------------| +| **DocumentDB Operator** (control plane) | The Kubernetes operator that manages your clusters | Operator binary + bundled CloudNative-PG | `helm upgrade` | +| **DocumentDB** (data plane) | The database engine running inside your cluster | Extension binary + gateway sidecar + database schema | Update `spec.documentDBVersion` and `spec.schemaVersion` | !!! info - The operator Helm chart bundles [CloudNative-PG](https://cloudnative-pg.io/) as a dependency. Upgrading the operator automatically upgrades the bundled CloudNative-PG version — this cannot be skipped separately. + The operator Helm chart bundles [CloudNative-PG](https://cloudnative-pg.io/) as a dependency. Upgrading the operator automatically upgrades the bundled CloudNative-PG version. -## DocumentDB Operator Upgrade +--- + +## Upgrading the Operator (Control Plane) -The DocumentDB operator is deployed via Helm. Upgrade it by updating the Helm release. +The operator is deployed via Helm. Upgrading it does **not** restart your DocumentDB cluster pods or change any data plane components. ### Step 1: Update the Helm Repository @@ -61,7 +63,7 @@ Server-side apply (`--server-side --force-conflicts`) is required because the Do !!! warning Always use CRDs from the **same version** as the Helm chart you are installing. Using CRDs from `main` or a different release may introduce schema mismatches. -### Step 4: Upgrade the DocumentDB Operator +### Step 4: Upgrade the Operator ```bash helm upgrade documentdb-operator documentdb/documentdb-operator \ @@ -88,7 +90,7 @@ kubectl get deployment -n documentdb-operator kubectl logs -n documentdb-operator deployment/documentdb-operator --tail=50 ``` -### Rollback +### Operator Rollback If the new operator version causes issues, roll back to the previous Helm release: @@ -103,30 +105,29 @@ helm rollback documentdb-operator -n documentdb-operator !!! note `helm rollback` reverts the operator deployment but does **not** revert CRDs. This is usually safe — CRD changes are additive, and the older operator ignores fields it does not recognize. Do **not** revert CRDs unless the release notes explicitly instruct you to, as removing fields from a CRD can invalidate existing resources. -### DocumentDB Operator Upgrade Notes - -- The DocumentDB operator upgrade does **not** restart your DocumentDB cluster pods. -- After upgrading the operator, update individual DocumentDB clusters to the latest component version. See [Component Upgrades](#component-upgrades) below. +--- -## Component Upgrades +## Upgrading DocumentDB (Data Plane) -Updating `spec.documentDBVersion` upgrades **both** the DocumentDB extension and the gateway together, since they share the same version. +Upgrading DocumentDB involves two distinct steps that can be performed together or separately: -### Schema Version Control +| Field | What It Does | Reversible? | +|-------|-------------|-------------| +| `spec.documentDBVersion` | Updates the **binary** — the extension image and gateway sidecar are replaced via rolling restart. | ✅ Yes — revert the field to roll back. | +| `spec.schemaVersion` | Runs `ALTER EXTENSION UPDATE` to migrate the **database schema** to match the binary. | ❌ No — schema changes are permanent. | -**The operator never changes your database schema unless you ask.** +Think of it as: **`documentDBVersion` installs the software, `schemaVersion` applies the database migration.** -- Set `documentDBVersion` → updates the binary (safe to roll back) -- Set `schemaVersion` → updates the database schema (irreversible) -- Set `schemaVersion: "auto"` → schema auto-updates with binary +!!! info "Why two fields?" + The binary (container image) can be swapped freely — if something goes wrong, revert `documentDBVersion` and the pods roll back to the previous image. But `ALTER EXTENSION UPDATE` modifies database catalog tables and cannot be undone. Separating these two steps gives you a safe rollback window between deploying new code and committing the schema change. -Think of it as: **`documentDBVersion` installs the software, `schemaVersion` applies the database migration.** +### Schema Version Modes -| `spec.schemaVersion` | Behavior | Use case | -|----------------------|----------|----------| -| Not set (default) | Binary upgrades. Schema stays at current version until you set `schemaVersion`. | Production — gives you a rollback-safe window before committing the schema change. | -| `"auto"` | Schema updates automatically with binary. No rollback window. | Development and testing — simple, one-step upgrades. | -| Explicit version (e.g. `"0.112.0"`) | Schema updates to exactly that version. | Controlled rollouts — you choose when and what version to finalize. | +| `spec.schemaVersion` | Behavior | Recommended For | +|----------------------|----------|-----------------| +| *(not set)* — default | Binary upgrades. Schema stays at its current version until you explicitly set `schemaVersion`. | **Production** — gives you a rollback-safe window before committing the schema change. | +| `"auto"` | Schema updates automatically whenever the binary version changes. | **Development and testing** — simple, one-step upgrades. | +| Explicit version (e.g., `"0.112.0"`) | Schema updates to exactly that version. | **Controlled rollouts** — you choose when and what version to finalize. | ### Pre-Upgrade Checklist @@ -134,14 +135,13 @@ Think of it as: **`documentDBVersion` installs the software, `schemaVersion` app 2. **Verify DocumentDB cluster health** — ensure all instances are running and healthy. 3. **Back up the DocumentDB cluster** — create an on-demand [backup](backup-and-restore.md) before upgrading. -### Step 1: Update the DocumentDB Version +### Upgrade Walkthrough -!!! danger - **Downgrades are not supported.** Once the schema has been updated (`status.schemaVersion` shows the new version), the operator **blocks** any attempt to set `documentDBVersion` to a version lower than the installed schema version. If you need to recover after a schema update, restore from backup. See [Rollback and Recovery](#rollback-and-recovery). +Choose the approach that matches your use case: -=== "Default (controlled upgrade)" +=== "Production (two-phase upgrade)" - Update only the binary version. The schema stays at the current version until you explicitly finalize: + **Step 1: Update the binary version.** The schema stays unchanged — this is safe to roll back. ```yaml title="documentdb.yaml" apiVersion: documentdb.io/preview @@ -151,14 +151,27 @@ Think of it as: **`documentDBVersion` installs the software, `schemaVersion` app namespace: default spec: documentDBVersion: "" - # schemaVersion is not set — schema stays, safe to roll back + # schemaVersion is not set — schema stays at current version ``` ```bash kubectl apply -f documentdb.yaml ``` - After the binary upgrade completes and you've validated the cluster, finalize the schema: + **Step 2: Validate.** Confirm the cluster is healthy and the new binary works as expected. + + ```bash + # Watch the rolling restart + kubectl get pods -n default -w + + # Check cluster status + kubectl get documentdb my-cluster -n default + + # Verify the schema version has NOT changed + kubectl get documentdb my-cluster -n default -o jsonpath='{.status.schemaVersion}' + ``` + + **Step 3: Finalize the schema.** Once you're confident the new binary is stable, commit the schema migration: ```bash kubectl patch documentdb my-cluster -n default \ @@ -168,7 +181,7 @@ Think of it as: **`documentDBVersion` installs the software, `schemaVersion` app !!! tip On subsequent upgrades, just update `documentDBVersion` again. The schema stays pinned at the previous `schemaVersion` value until you update it. -=== "Auto mode" +=== "Development (auto mode)" Update both the binary and schema in one step: @@ -190,7 +203,7 @@ Think of it as: **`documentDBVersion` installs the software, `schemaVersion` app !!! warning With `schemaVersion: "auto"`, the schema migration is irreversible once applied. You cannot roll back to the previous version — only restore from backup. -### Step 2: Monitor the Upgrade +### Monitoring the Upgrade ```bash # Watch the rolling restart @@ -199,7 +212,7 @@ kubectl get pods -n default -w # Check DocumentDB cluster status kubectl get documentdb my-cluster -n default -# Check the schema version after upgrade +# Check the current schema version kubectl get documentdb my-cluster -n default -o jsonpath='{.status.schemaVersion}' ``` @@ -209,14 +222,14 @@ Whether you can roll back depends on whether the schema has been updated: === "Schema not yet updated (rollback safe)" - If `status.schemaVersion` still shows the **previous** version, the extension schema migration has not run yet. You can safely roll back by reverting `spec.documentDBVersion` to the previous value: + If `status.schemaVersion` still shows the **previous** version, the schema migration has not run yet. You can safely roll back by reverting `spec.documentDBVersion`: ```bash - # Check the current schema version + # Verify the schema version is unchanged kubectl get documentdb my-cluster -n default -o jsonpath='{.status.schemaVersion}' ``` - If the schema version is unchanged, revert the `spec.documentDBVersion` field in your manifest and reapply: + If the schema version is unchanged, revert `spec.documentDBVersion` in your manifest and reapply: ```bash kubectl apply -f documentdb.yaml @@ -224,25 +237,41 @@ Whether you can roll back depends on whether the schema has been updated: === "Schema already updated (rollback blocked)" - If `status.schemaVersion` shows the **new** version, the schema migration has already been applied and **cannot be reversed**. The operator will **block** any attempt to roll back the binary image below the installed schema version, because running an older binary against a newer schema is untested and may cause data corruption. + If `status.schemaVersion` shows the **new** version, the schema migration has already been applied and **cannot be reversed**. The operator **blocks** any attempt to set `documentDBVersion` to a version lower than the installed schema version, because running an older binary against a newer schema may cause data corruption. To recover: Use the backup you created in the [Pre-Upgrade Checklist](#pre-upgrade-checklist) to restore the DocumentDB cluster to its pre-upgrade state. See [Backup and Restore](backup-and-restore.md) for instructions. !!! tip This is why the default two-phase mode exists — it gives you a rollback-safe window before committing the schema change. Always back up before upgrading, and validate the new binary before setting `schemaVersion`. -### How It Works +### How It Works Internally -1. You update the `spec.documentDBVersion` field. -2. The operator detects the version change and updates both the database image and the gateway sidecar image. -3. The underlying cluster manager performs a **rolling restart**: replicas are restarted first one at a time, then the **primary is restarted in place**. Expect a brief period of downtime while the primary pod restarts. +1. You update `spec.documentDBVersion`. +2. The operator updates the extension and gateway container images. +3. The underlying cluster manager performs a **rolling restart**: replicas restart first, then the **primary restarts in place**. Expect a brief period of downtime while the primary pod restarts. 4. After the primary pod is healthy, the operator checks `spec.schemaVersion`: - - **Not set (default)**: The operator **skips** the schema migration and emits a `SchemaUpdateAvailable` event. The binary is updated but the database schema is unchanged — you can safely roll back by reverting `documentDBVersion`. - - **`"auto"`**: The operator runs `ALTER EXTENSION documentdb UPDATE` to update the database schema to match the binary. This is irreversible. - - **Explicit version**: The operator runs `ALTER EXTENSION documentdb UPDATE TO ''` to update to that specific version. This is irreversible. -5. The operator tracks the current schema version in `status.schemaVersion`. + - **Not set (default)**: The operator **skips** the schema migration and emits a `SchemaUpdateAvailable` event. You can safely roll back by reverting `documentDBVersion`. + - **`"auto"`**: The operator runs `ALTER EXTENSION documentdb UPDATE` to update the schema to match the binary. This is irreversible. + - **Explicit version**: The operator runs `ALTER EXTENSION documentdb UPDATE TO ''` to update to that exact version. This is irreversible. +5. The operator records the installed schema version in `status.schemaVersion`. + +--- + +## Multi-Region Upgrades + +When running DocumentDB across multiple regions or clusters, coordinate upgrades carefully: + +1. **Upgrade replica regions first.** Start with read-only replica clusters. Validate health and replication before touching the primary region. +2. **Upgrade the primary region last.** Once all replicas are running the new binary successfully, upgrade the primary. +3. **Keep schema versions consistent.** After finalizing `schemaVersion` on the primary, update it on replica clusters promptly. Running mismatched schema versions across regions for extended periods is not recommended. +4. **Back up before each region's upgrade.** Create a backup in each region before starting its upgrade. + +!!! note + Multi-region upgrade orchestration is performed manually — the operator manages individual clusters and does not coordinate across regions automatically. + +--- -### Advanced: Independent Image Overrides +## Advanced: Independent Image Overrides In most cases, use `spec.documentDBVersion` to upgrade both components together. For advanced scenarios, you can override individual images: From 4ec8812f6d0783c3ad9970a28fa0602823cb5441 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 27 Mar 2026 19:45:27 -0400 Subject: [PATCH 07/13] docs: replace control/data plane with operator/clusters framing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../preview/operations/upgrades.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index 6bc58fd4..33aacbad 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -1,6 +1,6 @@ --- title: Upgrades -description: Upgrade the DocumentDB operator (control plane) and DocumentDB clusters (data plane), including version and schema management. +description: Upgrade the DocumentDB operator and DocumentDB clusters, including version and schema management. tags: - operations - upgrades @@ -13,21 +13,21 @@ tags: Upgrades keep your DocumentDB deployment current with the latest features, security patches, and bug fixes. -A DocumentDB deployment has two independently upgradable layers: +A DocumentDB deployment has two independently upgradable components: -| Layer | What It Is | What Changes | How to Trigger | -|-------|-----------|-------------|----------------| -| **DocumentDB Operator** (control plane) | The Kubernetes operator that manages your clusters | Operator binary + bundled CloudNative-PG | `helm upgrade` | -| **DocumentDB** (data plane) | The database engine running inside your cluster | Extension binary + gateway sidecar + database schema | Update `spec.documentDBVersion` and `spec.schemaVersion` | +| Component | What Changes | How to Trigger | +|-----------|-------------|----------------| +| **DocumentDB Operator** | Operator binary + bundled CloudNative-PG | `helm upgrade` | +| **DocumentDB Clusters** | Extension binary + gateway sidecar + database schema | Update `spec.documentDBVersion` and `spec.schemaVersion` | !!! info The operator Helm chart bundles [CloudNative-PG](https://cloudnative-pg.io/) as a dependency. Upgrading the operator automatically upgrades the bundled CloudNative-PG version. --- -## Upgrading the Operator (Control Plane) +## Upgrading the Operator -The operator is deployed via Helm. Upgrading it does **not** restart your DocumentDB cluster pods or change any data plane components. +The operator is deployed via Helm. Upgrading it does **not** restart your DocumentDB cluster pods or change any cluster components. ### Step 1: Update the Helm Repository @@ -107,7 +107,7 @@ helm rollback documentdb-operator -n documentdb-operator --- -## Upgrading DocumentDB (Data Plane) +## Upgrading DocumentDB Clusters Upgrading DocumentDB involves two distinct steps that can be performed together or separately: From c5a9dca48498d70e1537f5c0fdebba4a66902651 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 27 Mar 2026 19:52:19 -0400 Subject: [PATCH 08/13] docs: add rolling safety gap upgrade pattern to walkthrough Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../preview/operations/upgrades.md | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index 33aacbad..fdec21b2 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -181,6 +181,44 @@ Choose the approach that matches your use case: !!! tip On subsequent upgrades, just update `documentDBVersion` again. The schema stays pinned at the previous `schemaVersion` value until you update it. +=== "Production (rolling safety gap)" + + Keep the binary always one version ahead of the schema. This ensures you can roll back at any time because the running binary has already been validated with the current schema. + + **Example:** Your cluster is at binary `0.110.0` with schema `0.110.0`. A new version `0.112.0` is available. + + **Step 1: Upgrade the binary and finalize the *previous* schema together.** + + ```yaml title="documentdb.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-cluster + namespace: default + spec: + documentDBVersion: "0.112.0" # upgrade binary to new version + schemaVersion: "0.110.0" # finalize schema to current (previous) version + ``` + + ```bash + kubectl apply -f documentdb.yaml + ``` + + Now the binary is `0.112.0` and the schema is `0.110.0`. The new binary is designed to work with both the old and new schema, so this is safe. + + **Step 2: Validate.** Run your tests. If something goes wrong, revert `documentDBVersion` to `0.110.0` — the schema is still at `0.110.0`, so rollback is safe. + + **On the next upgrade** (e.g., `0.114.0`), repeat the pattern: + + ```yaml + spec: + documentDBVersion: "0.114.0" # upgrade binary to next version + schemaVersion: "0.112.0" # finalize schema to previous binary version + ``` + + !!! info + This pattern keeps a permanent rollback window. The schema is always one version behind the binary, so you never commit a schema change until the *next* binary has proven stable with it. + === "Development (auto mode)" Update both the binary and schema in one step: From b4ebfe17f752105a2a0611a4da957c44911a52e2 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 27 Mar 2026 19:54:30 -0400 Subject: [PATCH 09/13] docs: use sequential versions in rolling safety gap example Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../preview/operations/upgrades.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index fdec21b2..3cadf63b 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -185,7 +185,7 @@ Choose the approach that matches your use case: Keep the binary always one version ahead of the schema. This ensures you can roll back at any time because the running binary has already been validated with the current schema. - **Example:** Your cluster is at binary `0.110.0` with schema `0.110.0`. A new version `0.112.0` is available. + **Example:** Your cluster is at binary `0.110.0` with schema `0.110.0`. A new version `0.111.0` is available. **Step 1: Upgrade the binary and finalize the *previous* schema together.** @@ -196,7 +196,7 @@ Choose the approach that matches your use case: name: my-cluster namespace: default spec: - documentDBVersion: "0.112.0" # upgrade binary to new version + documentDBVersion: "0.111.0" # upgrade binary to new version schemaVersion: "0.110.0" # finalize schema to current (previous) version ``` @@ -204,16 +204,16 @@ Choose the approach that matches your use case: kubectl apply -f documentdb.yaml ``` - Now the binary is `0.112.0` and the schema is `0.110.0`. The new binary is designed to work with both the old and new schema, so this is safe. + Now the binary is `0.111.0` and the schema is `0.110.0`. The new binary is designed to work with both the old and new schema, so this is safe. **Step 2: Validate.** Run your tests. If something goes wrong, revert `documentDBVersion` to `0.110.0` — the schema is still at `0.110.0`, so rollback is safe. - **On the next upgrade** (e.g., `0.114.0`), repeat the pattern: + **On the next upgrade** (e.g., `0.112.0`), repeat the pattern: ```yaml spec: - documentDBVersion: "0.114.0" # upgrade binary to next version - schemaVersion: "0.112.0" # finalize schema to previous binary version + documentDBVersion: "0.112.0" # upgrade binary to next version + schemaVersion: "0.111.0" # finalize schema to previous binary version ``` !!! info From c1ff7d25b225b93f2920a293d4392915529b86e8 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 27 Mar 2026 19:55:44 -0400 Subject: [PATCH 10/13] docs: clarify rollback rules schema is permanent, version floor enforced Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../preview/operations/upgrades.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index 3cadf63b..2db8e26f 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -256,6 +256,11 @@ kubectl get documentdb my-cluster -n default -o jsonpath='{.status.schemaVersion ### Rollback and Recovery +Two rules govern rollback: + +1. **Schema cannot be rolled back.** `ALTER EXTENSION UPDATE` modifies database catalog tables permanently. There is no `ALTER EXTENSION DOWNGRADE`. +2. **`documentDBVersion` cannot be set below `status.schemaVersion`.** The operator blocks this because running an older binary against a newer schema is untested and may cause data corruption. + Whether you can roll back depends on whether the schema has been updated: === "Schema not yet updated (rollback safe)" @@ -273,11 +278,14 @@ Whether you can roll back depends on whether the schema has been updated: kubectl apply -f documentdb.yaml ``` -=== "Schema already updated (rollback blocked)" +=== "Schema already updated (restore from backup)" + + If `status.schemaVersion` shows the **new** version, the schema migration has already been applied. At this point: - If `status.schemaVersion` shows the **new** version, the schema migration has already been applied and **cannot be reversed**. The operator **blocks** any attempt to set `documentDBVersion` to a version lower than the installed schema version, because running an older binary against a newer schema may cause data corruption. + - You **cannot** revert `schemaVersion` — the database schema change is permanent. + - You **cannot** set `documentDBVersion` to a version older than `status.schemaVersion` — the operator rejects it. - To recover: Use the backup you created in the [Pre-Upgrade Checklist](#pre-upgrade-checklist) to restore the DocumentDB cluster to its pre-upgrade state. See [Backup and Restore](backup-and-restore.md) for instructions. + To recover: restore from the backup you created in the [Pre-Upgrade Checklist](#pre-upgrade-checklist). See [Backup and Restore](backup-and-restore.md) for instructions. !!! tip This is why the default two-phase mode exists — it gives you a rollback-safe window before committing the schema change. Always back up before upgrading, and validate the new binary before setting `schemaVersion`. From e67f5a79982b799ddf53e4a150053bdb5c77bf9f Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 27 Mar 2026 21:03:39 -0400 Subject: [PATCH 11/13] feat: set primaryUpdateMethod to switchover for safer rolling updates CNPG default is in-place restart. Switchover promotes a replica first, then restarts the old primary as replica, minimizing write downtime. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../preview/operations/upgrades.md | 2 +- operator/src/internal/cnpg/cnpg_cluster.go | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index 2db8e26f..b81fba8e 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -294,7 +294,7 @@ Whether you can roll back depends on whether the schema has been updated: 1. You update `spec.documentDBVersion`. 2. The operator updates the extension and gateway container images. -3. The underlying cluster manager performs a **rolling restart**: replicas restart first, then the **primary restarts in place**. Expect a brief period of downtime while the primary pod restarts. +3. The underlying cluster manager performs a **rolling update**: replicas are restarted first one at a time, then the primary is updated via **switchover** — a healthy replica is promoted to primary, and the old primary restarts as a replica. This minimizes downtime. 4. After the primary pod is healthy, the operator checks `spec.schemaVersion`: - **Not set (default)**: The operator **skips** the schema migration and emits a `SchemaUpdateAvailable` event. You can safely roll back by reverting `documentDBVersion`. - **`"auto"`**: The operator runs `ALTER EXTENSION documentdb UPDATE` to update the schema to match the binary. This is irreversible. diff --git a/operator/src/internal/cnpg/cnpg_cluster.go b/operator/src/internal/cnpg/cnpg_cluster.go index 614469fc..6d7faa17 100644 --- a/operator/src/internal/cnpg/cnpg_cluster.go +++ b/operator/src/internal/cnpg/cnpg_cluster.go @@ -65,8 +65,9 @@ func GetCnpgClusterSpec(req ctrl.Request, documentdb *dbpreview.DocumentDB, docu }, Spec: func() cnpgv1.ClusterSpec { spec := cnpgv1.ClusterSpec{ - Instances: documentdb.Spec.InstancesPerNode, - ImageName: documentdb.Spec.PostgresImage, + Instances: documentdb.Spec.InstancesPerNode, + ImageName: documentdb.Spec.PostgresImage, + PrimaryUpdateMethod: cnpgv1.PrimaryUpdateMethodSwitchover, StorageConfiguration: cnpgv1.StorageConfiguration{ StorageClass: storageClassPointer, // Use configured storage class or default Size: documentdb.Spec.Resource.Storage.PvcSize, From c5c4e0069e293a633b6405d9ffeb3e2ed6af096a Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 27 Mar 2026 21:04:09 -0400 Subject: [PATCH 12/13] docs: improve rollback tab titles with clearer scenarios Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../preview/operations/upgrades.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index b81fba8e..d39c7945 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -263,9 +263,9 @@ Two rules govern rollback: Whether you can roll back depends on whether the schema has been updated: -=== "Schema not yet updated (rollback safe)" +=== "Roll back binary only (documentDBVersion ≥ schemaVersion)" - If `status.schemaVersion` still shows the **previous** version, the schema migration has not run yet. You can safely roll back by reverting `spec.documentDBVersion`: + If `status.schemaVersion` still shows the **previous** version, the schema migration has not run yet. You can safely roll back by reverting `spec.documentDBVersion` to any version that is ≥ `status.schemaVersion`: ```bash # Verify the schema version is unchanged @@ -278,12 +278,12 @@ Whether you can roll back depends on whether the schema has been updated: kubectl apply -f documentdb.yaml ``` -=== "Schema already updated (restore from backup)" +=== "Roll back both binary and schema (restore from backup)" If `status.schemaVersion` shows the **new** version, the schema migration has already been applied. At this point: - You **cannot** revert `schemaVersion` — the database schema change is permanent. - - You **cannot** set `documentDBVersion` to a version older than `status.schemaVersion` — the operator rejects it. + - You **cannot** set `documentDBVersion` below `status.schemaVersion` — the operator rejects it. To recover: restore from the backup you created in the [Pre-Upgrade Checklist](#pre-upgrade-checklist). See [Backup and Restore](backup-and-restore.md) for instructions. From 3910dfdfddef4d32706ded2c8f18f60e569843bf Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 27 Mar 2026 21:04:34 -0400 Subject: [PATCH 13/13] docs: reframe multi-region upgrades as binary-first-then-schema Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../preview/operations/upgrades.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/docs/operator-public-documentation/preview/operations/upgrades.md b/docs/operator-public-documentation/preview/operations/upgrades.md index d39c7945..089002cc 100644 --- a/docs/operator-public-documentation/preview/operations/upgrades.md +++ b/docs/operator-public-documentation/preview/operations/upgrades.md @@ -305,12 +305,11 @@ Whether you can roll back depends on whether the schema has been updated: ## Multi-Region Upgrades -When running DocumentDB across multiple regions or clusters, coordinate upgrades carefully: +When running DocumentDB across multiple regions or clusters, use the two-phase upgrade pattern across all regions: -1. **Upgrade replica regions first.** Start with read-only replica clusters. Validate health and replication before touching the primary region. -2. **Upgrade the primary region last.** Once all replicas are running the new binary successfully, upgrade the primary. -3. **Keep schema versions consistent.** After finalizing `schemaVersion` on the primary, update it on replica clusters promptly. Running mismatched schema versions across regions for extended periods is not recommended. -4. **Back up before each region's upgrade.** Create a backup in each region before starting its upgrade. +1. **Upgrade the binary in all regions first.** Update `spec.documentDBVersion` in every cluster. Validate that all regions are healthy and replication is working correctly with the new binary. +2. **Finalize the schema in all regions.** Once every region is running the new binary successfully, set `spec.schemaVersion` across all clusters. This keeps a rollback-safe window — if any region fails the binary upgrade, you can revert `documentDBVersion` everywhere before any schema change is committed. +3. **Back up before each region's upgrade.** Create a backup in each region before starting its upgrade. !!! note Multi-region upgrade orchestration is performed manually — the operator manages individual clusters and does not coordinate across regions automatically.