feat(ledger): add v3 reconciler with Raft StatefulSet support#434
feat(ledger): add v3 reconciler with Raft StatefulSet support#434
Conversation
Ledger v3 uses Raft consensus with Pebble embedded storage instead of PostgreSQL. This adds a version-branched reconciler that creates a StatefulSet (with headless service for peer discovery) when the ledger version is >= v3.0.0-alpha. Key changes: - Version gate in Reconcile(): v3+ skips Database/migrations entirely - StatefulSet with OrderedReady policy, 3 PVCs (wal, data, cold-cache) - Headless service (ledger-raft) for Raft peer DNS discovery - ClusterIP service mapping port 8080→9000 for gateway compatibility - Pod entrypoint script: computes node-id from ordinal, bootstrap/join - Settings-driven: replicas, PVC sizes, storage classes, Pebble/Raft tunables - RBAC marker for apps/statefulsets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughWhen a Stack version is semver-major v3, reconciliation immediately delegates to a new v3 flow that installs a Gateway, a headless Raft Service, and a StatefulSet with per-pod PVCs, probes, env/command wiring and v3-specific settings; RBAC/owner metadata updated for StatefulSets. Changes
Sequence DiagramsequenceDiagram
participant Controller as Controller
participant Init as Reconciler\n(init.go)
participant V3 as ReconcileV3\n(v3.go)
participant K8s as Kubernetes API
Controller->>Init: Trigger reconciliation
Init->>Init: Parse version -> isV3?
alt isV3 == true
Init->>V3: Delegate to v3 reconciliation
V3->>K8s: Resolve image/config
K8s-->>V3: Image/config
V3->>K8s: Create/Update Gateway (Service + HTTP health)
V3->>K8s: Create/Update headless Service (ledger-raft)
V3->>K8s: Create/Update StatefulSet (pod template, PVCs, probes)
K8s-->>V3: Resources applied
V3-->>Init: v3 reconcile result
else isV3 == false
Init->>K8s: Proceed with existing v2 reconciliation
K8s-->>Init: v2 resources reconciled
end
Init-->>Controller: Reconciliation complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Document all v3-specific Settings keys in both: - Settings reference table (01-Settings.md) - Ledger module page (03-Ledger.md) with architecture overview, YAML examples, and tables for persistence/Pebble/Raft tunables Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/resources/ledgers/init.go`:
- Around line 44-47: The early return when isV3 is true skips v2 cleanup; before
calling reconcileV3(ctx, stack, ledger, version) invoke a v2-cleanup routine
(e.g., cleanupV2Resources or similar) that explicitly deletes v2-only objects:
the old v2 StatefulSet(s), Services, the legacy "ledger" and "ledger-worker"
workloads, and the reindex CronJob; place this call immediately before the
return in the isV3 branch so reconcileV3 can assume v2 resources are removed
(use existing stack/ledger helpers or add a small helper function to perform the
deletes).
In `@internal/resources/ledgers/v3.go`:
- Around line 270-273: The addresses for ADVERTISE_ADDR and BOOTSTRAP_ADDR
currently hardcode the cluster domain (".cluster.local"); change them to not
include the fixed domain and instead either append only the ".svc" suffix or
read a configurable env var (e.g., CLUSTER_DOMAIN or KUBE_DNS_DOMAIN) and use
that when building the hostnames. Update the two format strings that construct
ADVERTISE_ADDR and BOOTSTRAP_ADDR (which use headlessSvc, POD_NAME,
POD_NAMESPACE, v3PortRaft, v3PortGRPC) to omit ".cluster.local" or use the env
var, and apply the same fix to other occurrences (e.g., the similar code in
internal/resources/ledgers/reindex.go).
- Around line 106-139: The StatefulSet reconcile currently rebuilds and sets
spec.VolumeClaimTemplates every run (via buildV3VolumeClaimTemplates and the
core.CreateOrUpdate for appsv1.StatefulSet), but Kubernetes treats
volumeClaimTemplates as immutable and will reject updates; remove volume claim
template reconciliation or make persistence settings create-only: stop updating
spec.VolumeClaimTemplates in the CreateOrUpdate callback (leave existing
templates untouched if the StatefulSet already exists) and/or stop watching
ledger.v3.persistence in internal/resources/ledgers/init.go so changes to
ledger.v3.persistence are treated as create-time-only, or alternatively
implement explicit PVC resize/migration logic outside the StatefulSet
reconcilation flow if you need to change storage after creation.
- Around line 308-327: The code currently uses resource.MustParse(sizeStr) when
building the PersistentVolumeClaim in the function that reads sizeStr from
settings.GetStringOrDefault; replace this with resource.ParseQuantity(sizeStr),
check the returned error, and return that error (or a wrapped validation error)
instead of allowing a panic, so the PVC construction
(corev1.PersistentVolumeClaim / corev1.PersistentVolumeClaimSpec) only proceeds
when the quantity is valid.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0e0d012e-99af-42d6-9146-d5029484c003
📒 Files selected for processing (2)
internal/resources/ledgers/init.gointernal/resources/ledgers/v3.go
internal/resources/ledgers/init.go
Outdated
| isV3 := !semver.IsValid(version) || semver.Compare(version, "v3.0.0-alpha") >= 0 | ||
| if isV3 { | ||
| return reconcileV3(ctx, stack, ledger, version) | ||
| } |
There was a problem hiding this comment.
Clean up v2-only resources before handing off to the v3 reconciler.
This early return skips every v2-side delete path. On an in-place v2→v3 upgrade, reconcileV3 only creates the new StatefulSet/Services, so old resources like the v2 ledger / ledger-worker workloads and the reindex CronJob can survive the version flip and keep running.
Suggested direction
isV3 := !semver.IsValid(version) || semver.Compare(version, "v3.0.0-alpha") >= 0
if isV3 {
+ if err := cleanupV2Resources(ctx, stack, ledger); err != nil {
+ return err
+ }
return reconcileV3(ctx, stack, ledger, version)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/resources/ledgers/init.go` around lines 44 - 47, The early return
when isV3 is true skips v2 cleanup; before calling reconcileV3(ctx, stack,
ledger, version) invoke a v2-cleanup routine (e.g., cleanupV2Resources or
similar) that explicitly deletes v2-only objects: the old v2 StatefulSet(s),
Services, the legacy "ledger" and "ledger-worker" workloads, and the reindex
CronJob; place this call immediately before the return in the isV3 branch so
reconcileV3 can assume v2 resources are removed (use existing stack/ledger
helpers or add a small helper function to perform the deletes).
| volumeClaims, err := buildV3VolumeClaimTemplates(ctx, stackName) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| podTemplate, err := buildV3PodTemplate(ctx, stack, ledger, image) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| headlessSvcName := "ledger-raft" | ||
| stsName := "ledger" | ||
|
|
||
| _, _, err = core.CreateOrUpdate[*appsv1.StatefulSet](ctx, types.NamespacedName{ | ||
| Name: stsName, | ||
| Namespace: stackName, | ||
| }, | ||
| func(t *appsv1.StatefulSet) error { | ||
| t.Spec = appsv1.StatefulSetSpec{ | ||
| Replicas: &replicas, | ||
| ServiceName: headlessSvcName, | ||
| PodManagementPolicy: appsv1.OrderedReadyPodManagement, | ||
| Selector: &metav1.LabelSelector{ | ||
| MatchLabels: map[string]string{ | ||
| "app.kubernetes.io/name": "ledger", | ||
| }, | ||
| }, | ||
| Template: *podTemplate, | ||
| VolumeClaimTemplates: volumeClaims, | ||
| } | ||
| return nil | ||
| }, | ||
| core.WithController[*appsv1.StatefulSet](ctx.GetScheme(), ledger), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd "v3.go|init.go" internal/resources/ledgers/Repository: formancehq/operator
Length of output: 130
🏁 Script executed:
cat -n internal/resources/ledgers/v3.go | head -150Repository: formancehq/operator
Length of output: 5264
🏁 Script executed:
cat -n internal/resources/ledgers/init.goRepository: formancehq/operator
Length of output: 5306
🏁 Script executed:
cat -n internal/resources/ledgers/v3.go | sed -n '140,350p'Repository: formancehq/operator
Length of output: 7868
🏁 Script executed:
grep -n "buildV3VolumeClaimTemplates" internal/resources/ledgers/v3.go -A 30Repository: formancehq/operator
Length of output: 2234
🌐 Web query:
Kubernetes StatefulSet volumeClaimTemplates immutable update validation
💡 Result:
In Kubernetes StatefulSets, the volumeClaimTemplates field in the spec is immutable after creation. Attempts to update it trigger validation that forbids changes to spec fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'revisionHistoryLimit', 'persistentVolumeClaimRetentionPolicy', and 'minReadySeconds'. This is enforced by the StatefulSet controller's update validation logic, as evidenced by consistent error messages across sources and the ongoing KEP #4650 (as of 2026-01) to enable limited updates via a feature gate (UpdateVolumeClaimTemplate), which remains in alpha/open status. Workarounds for changes like PVC size include: - Edit existing PVCs directly (if StorageClass allows expansion). - Delete the StatefulSet with --cascade=orphan (preserves Pods/PVCs), update the manifest, and re-apply. Official docs do not list immutable fields explicitly but confirm via API reference and behavior.
Citations:
- 1: StatefulSet Support for Updating Volume Claim Template kubernetes/enhancements#4650
- 2: https://dev.to/setevoy/kubernetes-pvc-in-a-statefulset-and-the-forbidden-updates-to-statefulset-spec-error-5007
- 3: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
- 4: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/stateful-set-v1
- 5: Subject: Inaccurate Error Message for StatefulSet Updates in Kubernetes kubernetes/kubernetes#120171
- 6: https://stackoverflow.com/questions/77033799/how-to-update-a-statefulset
- 7: https://stackoverflow.com/questions/49311911/forbidden-updates-to-statefulset-spec-for-fields-other-than-replicas-templa
- 8: StatefulSet Support for Updating Volume Claim Template kubernetes/kubernetes#126530
- 9: Allow adding more volumeClaimTemplates to an existing statefulSet kubernetes/kubernetes#69041
Persistence settings won't reconcile after StatefulSet is created.
internal/resources/ledgers/init.go watches Settings, but this code rebuilds spec.volumeClaimTemplates on each reconcile. Kubernetes treats volumeClaimTemplates as immutable on update and rejects spec changes outside a strict allowlist (replicas, ordinals, template, updateStrategy, revisionHistoryLimit, persistentVolumeClaimRetentionPolicy, minReadySeconds), so changing ledger.v3.persistence.* settings will fail with validation errors. Either remove persistence settings from the watch or enforce them as create-time-only with no further reconciliation, otherwise implement explicit PVC resize/migration logic outside the StatefulSet template.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/resources/ledgers/v3.go` around lines 106 - 139, The StatefulSet
reconcile currently rebuilds and sets spec.VolumeClaimTemplates every run (via
buildV3VolumeClaimTemplates and the core.CreateOrUpdate for appsv1.StatefulSet),
but Kubernetes treats volumeClaimTemplates as immutable and will reject updates;
remove volume claim template reconciliation or make persistence settings
create-only: stop updating spec.VolumeClaimTemplates in the CreateOrUpdate
callback (leave existing templates untouched if the StatefulSet already exists)
and/or stop watching ledger.v3.persistence in internal/resources/ledgers/init.go
so changes to ledger.v3.persistence are treated as create-time-only, or
alternatively implement explicit PVC resize/migration logic outside the
StatefulSet reconcilation flow if you need to change storage after creation.
| fmt.Sprintf(`ADVERTISE_ADDR="${POD_NAME}.%s.${POD_NAMESPACE}.svc.cluster.local:%d"`, headlessSvc, v3PortRaft), | ||
| // First pod (ordinal 0) bootstraps if no checkpoint exists yet, otherwise normal start. | ||
| // Other pods join pod-0. | ||
| fmt.Sprintf(`BOOTSTRAP_ADDR="ledger-0.%s.${POD_NAMESPACE}.svc.cluster.local:%d"`, headlessSvc, v3PortGRPC), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, examine the actual code file and surrounding context
cat -n internal/resources/ledgers/v3.go | sed -n '260,280p'Repository: formancehq/operator
Length of output: 1174
🏁 Script executed:
# Search for related DNS/domain configurations in the codebase
rg -i "cluster\.local|cluster-domain|cluster.domain" --type goRepository: formancehq/operator
Length of output: 1079
🏁 Script executed:
# Check if there are any existing configurable domain patterns
rg -i "domain|dns|hostname" internal/resources/ledgers/ -A 2 -B 2 --type go | head -50Repository: formancehq/operator
Length of output: 518
🌐 Web query:
Kubernetes kubelet --cluster-domain custom DNS suffix configuration
💡 Result:
The Kubernetes kubelet supports configuring a custom DNS cluster domain suffix using the --cluster-domain command-line flag or, preferably, the clusterDomain field in the KubeletConfiguration file. Command-line flag (deprecated but functional): kubelet --cluster-domain=your.custom.domain This configures the kubelet to append the specified domain to container DNS search paths, enabling short names like my-svc.my-ns to resolve as my-svc.my-ns.your.custom.domain. Default is typically cluster.local in most distributions. Recommended configuration file method (use --config=/path/to/config.yaml): apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration clusterDomain: your.custom.domain clusterDNS: - 10.96.0.10 # Cluster DNS IP, e.g., CoreDNS service IP The kubelet generates /etc/resolv.conf in pods with search domains including the pod namespace, svc, and clusterDomain (e.g., search myns.svc.your.custom.domain svc.your.custom.domain your.custom.domain). With kubeadm, include KubeletConfiguration in the init/join config file, or set ClusterConfiguration.networking.dnsDomain which propagates to kubelet. Restart kubelet after changes. This affects pod/service DNS resolution cluster-wide; all kubelets must use the same value for consistency.
Citations:
- 1: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
- 2: https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
- 3: https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1
- 4: https://stackable.tech/en/blog/kubernetes-clusterdomain-setting/
- 5: https://oneuptime.com/blog/post/2026-02-09-kubelet-configuration-files-not-flags/view
Don't hardcode cluster.local into Raft peer addresses.
Kubernetes allows operators to configure a custom DNS cluster domain via the kubelet --cluster-domain flag (default is cluster.local). Building ADVERTISE_ADDR and BOOTSTRAP_ADDR as pod.svc.cluster.local will fail peer discovery on clusters using non-default domains. Make the cluster domain configurable or use namespace-local service names that don't rely on the domain suffix.
Note: This pattern is repeated elsewhere in the codebase (e.g., internal/resources/ledgers/reindex.go) and should be addressed consistently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/resources/ledgers/v3.go` around lines 270 - 273, The addresses for
ADVERTISE_ADDR and BOOTSTRAP_ADDR currently hardcode the cluster domain
(".cluster.local"); change them to not include the fixed domain and instead
either append only the ".svc" suffix or read a configurable env var (e.g.,
CLUSTER_DOMAIN or KUBE_DNS_DOMAIN) and use that when building the hostnames.
Update the two format strings that construct ADVERTISE_ADDR and BOOTSTRAP_ADDR
(which use headlessSvc, POD_NAME, POD_NAMESPACE, v3PortRaft, v3PortGRPC) to omit
".cluster.local" or use the env var, and apply the same fix to other occurrences
(e.g., the similar code in internal/resources/ledgers/reindex.go).
| sizeStr, err := settings.GetStringOrDefault(ctx, stackName, s.defaultSize, strings.Split(s.sizeKey, ".")...) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| storageClass, err := settings.GetStringOrEmpty(ctx, stackName, strings.Split(s.storageClassKey, ".")...) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| pvc := corev1.PersistentVolumeClaim{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: s.name, | ||
| }, | ||
| Spec: corev1.PersistentVolumeClaimSpec{ | ||
| AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce}, | ||
| Resources: corev1.VolumeResourceRequirements{ | ||
| Requests: corev1.ResourceList{ | ||
| corev1.ResourceStorage: resource.MustParse(sizeStr), | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -type f -name "v3.go" -path "*/ledgers/*" 2>/dev/null | head -20Repository: formancehq/operator
Length of output: 1882
🏁 Script executed:
find . -type f -name "v3.go" 2>/dev/null | head -20Repository: formancehq/operator
Length of output: 97
🏁 Script executed:
find . -type f -path "*/resources/ledgers/v3.go" 2>/dev/nullRepository: formancehq/operator
Length of output: 97
🏁 Script executed:
git ls-files | grep -E "ledgers.*v3\.go"Repository: formancehq/operator
Length of output: 95
🏁 Script executed:
git ls-files | grep -E "v3\.go" | head -10Repository: formancehq/operator
Length of output: 95
🏁 Script executed:
sed -n '308,327p' internal/resources/ledgers/v3.goRepository: formancehq/operator
Length of output: 733
🏁 Script executed:
sed -n '300,330p' internal/resources/ledgers/v3.goRepository: formancehq/operator
Length of output: 1159
🌐 Web query:
k8s.io resource.MustParse panic behavior vs ParseQuantity
💡 Result:
In the k8s.io/apimachinery/pkg/api/resource package, MustParse(str string) Quantity parses the given string into a Quantity and panics if parsing fails (e.g., invalid format or numeric issues). It explicitly calls ParseQuantity internally and panics on any non-nil error with a message like "cannot parse '%v': %v". In contrast, ParseQuantity(str string) (Quantity, error) performs the same parsing but returns an error (such as ErrFormatWrong for invalid format or ErrNumeric for unable to parse numeric part) instead of panicking, allowing the caller to handle failures gracefully. Use MustParse only when the input is guaranteed valid (e.g., tests or hardcoded values). Use ParseQuantity for untrusted input to avoid panics. Note: Recent issues (as of late 2025) show edge cases near math.MaxInt64 where MustParse could produce invalid quantities without panicking, but this is a parsing bug being fixed, not a change in panic behavior for invalid inputs.
Citations:
- 1: https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource
- 2: https://github.com/kubernetes/apimachinery/blob/master/pkg/api/resource/quantity.go
- 3: resource.MustParse fails to parse quantities near math.MaxInt64 kubernetes/kubernetes#135487
- 4: Fix resource.MustParse for MaxInt64 and MinInt64Fix resource.MustParse for MaxInt64 and MinInt64 kubernetes/kubernetes#135690
🌐 Web query:
kubernetes apimachinery resource.MustParse documentation
💡 Result:
resource.MustParse is a function in the Kubernetes apimachinery package at k8s.io/apimachinery/pkg/api/resource. It parses a string into a Quantity and panics on error, intended for cases where the input is known to be valid, such as tests. Signature: func MustParse(str string) Quantity It calls ParseQuantity internally and panics with fmt.Errorf("cannot parse '%v': %v", str, err) if parsing fails. Quantity represents fixed-point numbers for resources like CPU and memory, supporting formats: BinarySI (e.g., "5Gi"), DecimalSI (e.g., "5G"), DecimalExponent (e.g., "12e6"). Example: memorySize := resource.MustParse("5Gi") fmt.Printf("memorySize = %v (%v)\n", memorySize.Value, memorySize.Format) Usage in resource specs: Resources: v1.ResourceRequirements{ Limits: v1.ResourceList{ "cpu": resource.MustParse("1"), "memory": resource.MustParse("1Gi"), }, Requests: v1.ResourceList{ "cpu": resource.MustParse("500m"), "memory": resource.MustParse("512Mi"), }, },
Citations:
- 1: https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource
- 2: https://github.com/kubernetes/apimachinery/blob/master/pkg/api/resource/quantity.go
- 3: https://stackoverflow.com/questions/52983173/how-to-provide-resource-limits-in-kubernetes-go-client-pod-spec
🏁 Script executed:
grep -n "MustParse" internal/resources/ledgers/v3.goRepository: formancehq/operator
Length of output: 125
🏁 Script executed:
grep -n "resource\." internal/resources/ledgers/v3.go | head -20Repository: formancehq/operator
Length of output: 125
🏁 Script executed:
head -50 internal/resources/ledgers/v3.goRepository: formancehq/operator
Length of output: 1540
Replace resource.MustParse with resource.ParseQuantity to avoid panics on invalid PVC size settings.
The sizeStr value comes from user-controlled settings via GetStringOrDefault, which means a typo in ledger.v3.persistence.*.size will cause the reconciler to panic instead of returning a validation error. Use resource.ParseQuantity to handle invalid input gracefully.
Safer parsing
for _, s := range specs {
sizeStr, err := settings.GetStringOrDefault(ctx, stackName, s.defaultSize, strings.Split(s.sizeKey, ".")...)
if err != nil {
return nil, err
}
+ sizeQty, err := resource.ParseQuantity(sizeStr)
+ if err != nil {
+ return nil, fmt.Errorf("invalid %s value %q: %w", s.sizeKey, sizeStr, err)
+ }
storageClass, err := settings.GetStringOrEmpty(ctx, stackName, strings.Split(s.storageClassKey, ".")...)
if err != nil {
return nil, err
}
@@
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
Resources: corev1.VolumeResourceRequirements{
Requests: corev1.ResourceList{
- corev1.ResourceStorage: resource.MustParse(sizeStr),
+ corev1.ResourceStorage: sizeQty,
},
},
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sizeStr, err := settings.GetStringOrDefault(ctx, stackName, s.defaultSize, strings.Split(s.sizeKey, ".")...) | |
| if err != nil { | |
| return nil, err | |
| } | |
| storageClass, err := settings.GetStringOrEmpty(ctx, stackName, strings.Split(s.storageClassKey, ".")...) | |
| if err != nil { | |
| return nil, err | |
| } | |
| pvc := corev1.PersistentVolumeClaim{ | |
| ObjectMeta: metav1.ObjectMeta{ | |
| Name: s.name, | |
| }, | |
| Spec: corev1.PersistentVolumeClaimSpec{ | |
| AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce}, | |
| Resources: corev1.VolumeResourceRequirements{ | |
| Requests: corev1.ResourceList{ | |
| corev1.ResourceStorage: resource.MustParse(sizeStr), | |
| }, | |
| sizeStr, err := settings.GetStringOrDefault(ctx, stackName, s.defaultSize, strings.Split(s.sizeKey, ".")...) | |
| if err != nil { | |
| return nil, err | |
| } | |
| sizeQty, err := resource.ParseQuantity(sizeStr) | |
| if err != nil { | |
| return nil, fmt.Errorf("invalid %s value %q: %w", s.sizeKey, sizeStr, err) | |
| } | |
| storageClass, err := settings.GetStringOrEmpty(ctx, stackName, strings.Split(s.storageClassKey, ".")...) | |
| if err != nil { | |
| return nil, err | |
| } | |
| pvc := corev1.PersistentVolumeClaim{ | |
| ObjectMeta: metav1.ObjectMeta{ | |
| Name: s.name, | |
| }, | |
| Spec: corev1.PersistentVolumeClaimSpec{ | |
| AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce}, | |
| Resources: corev1.VolumeResourceRequirements{ | |
| Requests: corev1.ResourceList{ | |
| corev1.ResourceStorage: sizeQty, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/resources/ledgers/v3.go` around lines 308 - 327, The code currently
uses resource.MustParse(sizeStr) when building the PersistentVolumeClaim in the
function that reads sizeStr from settings.GetStringOrDefault; replace this with
resource.ParseQuantity(sizeStr), check the returned error, and return that error
(or a wrapped validation error) instead of allowing a panic, so the PVC
construction (corev1.PersistentVolumeClaim / corev1.PersistentVolumeClaimSpec)
only proceeds when the quantity is valid.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/04-Modules/03-Ledger.md`:
- Around line 104-110: Update the "Ledger v3" section to clarify that the global
"Requirements" above apply to v2 (not v3) by adding a short note or adjusting
headings; specifically modify the "Ledger v3" heading/content to include a
sentence like "Note: the top-level requirements apply to Ledger v2 — Ledger v3
uses embedded Pebble storage and does not require PostgreSQL or a message
broker." Ensure you reference the "Ledger v3" section and the existing
"Requirements" wording so readers won't misconfigure installs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 22a15665-0d3d-4966-a6ee-1cff0073e5a3
📒 Files selected for processing (2)
docs/04-Modules/03-Ledger.mddocs/09-Configuration reference/01-Settings.md
✅ Files skipped from review due to trivial changes (1)
- docs/09-Configuration reference/01-Settings.md
| ## Ledger v3 | ||
|
|
||
| Ledger v3 is an architecturally different version that uses Raft consensus with Pebble embedded storage instead of PostgreSQL. When the version is `>= v3.0.0-alpha`, the operator deploys a **StatefulSet** instead of a Deployment. | ||
|
|
||
| ### Requirements | ||
|
|
||
| Ledger v3 does **not** require PostgreSQL or a message broker. Storage is fully embedded (Pebble LSM). |
There was a problem hiding this comment.
Clarify version-scoped requirements to avoid contradictory guidance.
Lines 106 and 110 correctly describe v3, but the page still starts with global requirements that say PostgreSQL is required. Please add a short note in this new v3 section (or adjust headings) that the top requirements apply to v2, so readers don’t misconfigure installs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/04-Modules/03-Ledger.md` around lines 104 - 110, Update the "Ledger v3"
section to clarify that the global "Requirements" above apply to v2 (not v3) by
adding a short note or adjusting headings; specifically modify the "Ledger v3"
heading/content to include a sentence like "Note: the top-level requirements
apply to Ledger v2 — Ledger v3 uses embedded Pebble storage and does not require
PostgreSQL or a message broker." Ensure you reference the "Ledger v3" section
and the existing "Requirements" wording so readers won't misconfigure installs.
- Add 20 Ginkgo integration tests for v3 StatefulSet reconciler - Prefix all v3 settings with "module." for consistency - Fix version gate to use semver.Major == "v3" (avoids breaking v2 tests) - Remove redundant services.Create (GatewayHTTPAPI handles ClusterIP service) - Update docs with correct settings key paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
internal/resources/ledgers/v3.go (3)
300-318:⚠️ Potential issue | 🟠 MajorAvoid
resource.MustParseon Settings input.
sizeStris user-controlled here. A typo like20GGwill panic this reconcile path instead of returning a validation error.🛡️ Safer quantity parsing
sizeStr, err := settings.GetStringOrDefault(ctx, stackName, s.defaultSize, strings.Split(s.sizeKey, ".")...) if err != nil { return nil, err } + sizeQty, err := resource.ParseQuantity(sizeStr) + if err != nil { + return nil, fmt.Errorf("invalid %s value %q: %w", s.sizeKey, sizeStr, err) + } storageClass, err := settings.GetStringOrEmpty(ctx, stackName, strings.Split(s.storageClassKey, ".")...) if err != nil { return nil, err } @@ Resources: corev1.VolumeResourceRequirements{ Requests: corev1.ResourceList{ - corev1.ResourceStorage: resource.MustParse(sizeStr), + corev1.ResourceStorage: sizeQty, }, }, },Does k8s.io/apimachinery/pkg/api/resource.MustParse panic on invalid quantities?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/resources/ledgers/v3.go` around lines 300 - 318, The code currently calls resource.MustParse(sizeStr) when building the PersistentVolumeClaim ( PVC creation using s.name and the sizeStr obtained via settings.GetStringOrDefault), which will panic on invalid user input; replace MustParse with resource.ParseQuantity and handle the error: call resource.ParseQuantity(sizeStr), return a clear validation/error from the reconcile function if parsing fails (with context including s.sizeKey/stackName), and only use the returned Quantity in the PVC Spec when parsing succeeds.
111-126:⚠️ Potential issue | 🟠 Major
volumeClaimTemplatescan't be treated as a live-reconciled setting.
internal/resources/ledgers/init.gowatches Settings changes, and this callback rewritest.Spec.VolumeClaimTemplateson every reconcile. Kubernetes rejects StatefulSet updates that mutatespec.volumeClaimTemplates, so changingmodule.ledger.v3.persistence.*after the first create will wedge reconciliation. Either make these settings create-time-only or manage PVC expansion outside the StatefulSet template.Can Kubernetes StatefulSet spec.volumeClaimTemplates be changed after the StatefulSet is created?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/resources/ledgers/v3.go` around lines 111 - 126, The reconcile currently overwrites t.Spec.VolumeClaimTemplates inside the CreateOrUpdate callback for the StatefulSet (see the anonymous func(t *appsv1.StatefulSet) in CreateOrUpdate[*appsv1.StatefulSet]), which mutates spec.volumeClaimTemplates and will be rejected by Kubernetes on updates; change the logic so VolumeClaimTemplates are only set at creation time (or omitted from the Update path): detect whether t.ObjectMeta.Generation or t.Spec.VolumeClaimTemplates is already present and skip replacing t.Spec.VolumeClaimTemplates on updates, or move PVC size/expansion handling out of the StatefulSet template into separate PVC patch/expand code; also adjust the Settings watcher in internal/resources/ledgers/init.go to treat module.ledger.v3.persistence.* as create-time-only or trigger your separate PVC expansion flow instead of reconciling the StatefulSet template.
262-265:⚠️ Potential issue | 🟠 MajorDon't hardcode
.cluster.localinto peer addresses.Clusters can run with a non-default DNS suffix, so these FQDNs break Raft discovery outside
cluster.local. Make the cluster domain configurable or derive addresses without baking in the suffix.Can Kubernetes clusters use a DNS cluster domain other than cluster.local?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/resources/ledgers/v3.go` around lines 262 - 265, The peer FQDNs are hardcoded with ".cluster.local" which breaks clusters using a different DNS suffix; change the ADVERTISE_ADDR and BOOTSTRAP_ADDR string templates to not embed ".cluster.local" and instead append a cluster domain variable (e.g., use an env var like CLUSTER_DOMAIN or the Kubernetes-provided KUBERNETES_SERVICE_DNS_DOMAIN with a sensible fallback) when building the addresses; update the templates that reference ADVERTISE_ADDR and BOOTSTRAP_ADDR (using headlessSvc, v3PortRaft, v3PortGRPC, POD_NAME and POD_NAMESPACE) to include the cluster domain variable rather than the literal ".cluster.local" so the domain is configurable at runtime.internal/resources/ledgers/init.go (1)
44-46:⚠️ Potential issue | 🟠 MajorClean up v2 resources before taking the v3 fast-path.
This early return still skips the existing v2 delete path, so an in-place v2→v3 upgrade can leave the old Deployment, worker, and reindex CronJob running beside the new StatefulSet.
🧹 Suggested direction
isV3 := semver.IsValid(version) && semver.Major(version) == "v3" if isV3 { + if err := cleanupV2Resources(ctx, stack, ledger); err != nil { + return err + } return reconcileV3(ctx, stack, ledger, version) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/resources/ledgers/init.go` around lines 44 - 46, The early return for the v3 fast-path (the isV3 check using semver.IsValid/semver.Major that currently calls reconcileV3 and returns) skips the existing v2 delete path and leaves v2 resources running; before returning from the v3 branch, invoke the v2 cleanup/delete logic (the existing v2 delete path that removes the v2 Deployment, worker, and reindex CronJob) so v2 resources are removed during an in-place v2→v3 upgrade—i.e., call or reuse the v2 cleanup function/logic immediately before or inside the isV3 branch and only then proceed to return reconcileV3(ctx, stack, ledger, version).
🧹 Nitpick comments (2)
internal/tests/ledger_v3_controller_test.go (2)
163-169: Assert the generatedledgerService here too.
reconcileV3now relies on the GatewayHTTPAPI reconciler to materialize the ClusterIP Service, so only checking the CR leaves the 8080 → targetPorthttp/ container port 9000 compatibility path untested.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/tests/ledger_v3_controller_test.go` around lines 163 - 169, The test currently only asserts the GatewayHTTPAPI CR but not the generated ClusterIP Service; update the test after loading the GatewayHTTPAPI (httpAPI) to also load the Service resource named core.GetObjectName(stack.Name, "ledger") into a v1.Service struct and assert that its spec.ports contains a port 8080 that targets port named "http" (or targetPort 9000) so the 8080→targetPort mapping created by the GatewayHTTPAPI reconciler (reconcileV3 path) is verified; use the same Eventually/LoadResource pattern (or Expect) as used for httpAPI and reference types GatewayHTTPAPI, reconcileV3, and the Service object to locate where to add the assertion.
186-204: Add the even-replica rejection case alongside this happy path.
installV3StatefulSetexplicitly rejects even replica counts, but this suite never proves that validation. Amodule.ledger.v3.replicas=4case would lock in the behavior promised by the PR.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/tests/ledger_v3_controller_test.go` around lines 186 - 204, Add a parallel negative test that verifies even replica counts are rejected: create a new Context using the same replicasSetting variable but set value "4" for the key "module.ledger.v3.replicas" and in JustBeforeEach attempt to Create(replicasSetting) and assert it returns an error (or that no StatefulSet named "ledger" is ever created via LoadResource). Reference the existing symbols replicasSetting, Create, Delete, LoadResource and the installV3StatefulSet behavior to locate where to add an It block that expects creation to fail (or that *sts.Spec.Replicas is not set) when the value is even.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/tests/ledger_v3_controller_test.go`:
- Around line 171-175: Modify the "Should NOT create a Database object" test to
first wait for a positive v3 signal (e.g., assert existence of the StatefulSet
or the headless Service created by the controller) before asserting Database
absence: replace the immediate Consistently call that uses LoadResource("",
core.GetObjectName(stack.Name, "ledger"), &v1beta1.Database{}) with a two-step
approach—1) Block until the StatefulSet or headless Service is observed (use the
test helper that GETs the StatefulSet/Service by name and expects it to
Eventually(Succeed)), then 2) run the Consistently loop that calls
LoadResource(..., &v1beta1.Database{}) and ShouldNot(Succeed()) to ensure the
Database remains absent for the short window; keep references to LoadResource,
core.GetObjectName(stack.Name, "ledger"), v1beta1.Database and the test "Should
NOT create a Database object" so the change is localized.
---
Duplicate comments:
In `@internal/resources/ledgers/init.go`:
- Around line 44-46: The early return for the v3 fast-path (the isV3 check using
semver.IsValid/semver.Major that currently calls reconcileV3 and returns) skips
the existing v2 delete path and leaves v2 resources running; before returning
from the v3 branch, invoke the v2 cleanup/delete logic (the existing v2 delete
path that removes the v2 Deployment, worker, and reindex CronJob) so v2
resources are removed during an in-place v2→v3 upgrade—i.e., call or reuse the
v2 cleanup function/logic immediately before or inside the isV3 branch and only
then proceed to return reconcileV3(ctx, stack, ledger, version).
In `@internal/resources/ledgers/v3.go`:
- Around line 300-318: The code currently calls resource.MustParse(sizeStr) when
building the PersistentVolumeClaim ( PVC creation using s.name and the sizeStr
obtained via settings.GetStringOrDefault), which will panic on invalid user
input; replace MustParse with resource.ParseQuantity and handle the error: call
resource.ParseQuantity(sizeStr), return a clear validation/error from the
reconcile function if parsing fails (with context including
s.sizeKey/stackName), and only use the returned Quantity in the PVC Spec when
parsing succeeds.
- Around line 111-126: The reconcile currently overwrites
t.Spec.VolumeClaimTemplates inside the CreateOrUpdate callback for the
StatefulSet (see the anonymous func(t *appsv1.StatefulSet) in
CreateOrUpdate[*appsv1.StatefulSet]), which mutates spec.volumeClaimTemplates
and will be rejected by Kubernetes on updates; change the logic so
VolumeClaimTemplates are only set at creation time (or omitted from the Update
path): detect whether t.ObjectMeta.Generation or t.Spec.VolumeClaimTemplates is
already present and skip replacing t.Spec.VolumeClaimTemplates on updates, or
move PVC size/expansion handling out of the StatefulSet template into separate
PVC patch/expand code; also adjust the Settings watcher in
internal/resources/ledgers/init.go to treat module.ledger.v3.persistence.* as
create-time-only or trigger your separate PVC expansion flow instead of
reconciling the StatefulSet template.
- Around line 262-265: The peer FQDNs are hardcoded with ".cluster.local" which
breaks clusters using a different DNS suffix; change the ADVERTISE_ADDR and
BOOTSTRAP_ADDR string templates to not embed ".cluster.local" and instead append
a cluster domain variable (e.g., use an env var like CLUSTER_DOMAIN or the
Kubernetes-provided KUBERNETES_SERVICE_DNS_DOMAIN with a sensible fallback) when
building the addresses; update the templates that reference ADVERTISE_ADDR and
BOOTSTRAP_ADDR (using headlessSvc, v3PortRaft, v3PortGRPC, POD_NAME and
POD_NAMESPACE) to include the cluster domain variable rather than the literal
".cluster.local" so the domain is configurable at runtime.
---
Nitpick comments:
In `@internal/tests/ledger_v3_controller_test.go`:
- Around line 163-169: The test currently only asserts the GatewayHTTPAPI CR but
not the generated ClusterIP Service; update the test after loading the
GatewayHTTPAPI (httpAPI) to also load the Service resource named
core.GetObjectName(stack.Name, "ledger") into a v1.Service struct and assert
that its spec.ports contains a port 8080 that targets port named "http" (or
targetPort 9000) so the 8080→targetPort mapping created by the GatewayHTTPAPI
reconciler (reconcileV3 path) is verified; use the same Eventually/LoadResource
pattern (or Expect) as used for httpAPI and reference types GatewayHTTPAPI,
reconcileV3, and the Service object to locate where to add the assertion.
- Around line 186-204: Add a parallel negative test that verifies even replica
counts are rejected: create a new Context using the same replicasSetting
variable but set value "4" for the key "module.ledger.v3.replicas" and in
JustBeforeEach attempt to Create(replicasSetting) and assert it returns an error
(or that no StatefulSet named "ledger" is ever created via LoadResource).
Reference the existing symbols replicasSetting, Create, Delete, LoadResource and
the installV3StatefulSet behavior to locate where to add an It block that
expects creation to fail (or that *sts.Spec.Replicas is not set) when the value
is even.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 37140e16-8592-48e3-a693-5ce75bf95db7
⛔ Files ignored due to path filters (1)
config/rbac/role.yamlis excluded by!**/*.yaml
📒 Files selected for processing (5)
docs/04-Modules/03-Ledger.mddocs/09-Configuration reference/01-Settings.mdinternal/resources/ledgers/init.gointernal/resources/ledgers/v3.gointernal/tests/ledger_v3_controller_test.go
✅ Files skipped from review due to trivial changes (2)
- docs/04-Modules/03-Ledger.md
- docs/09-Configuration reference/01-Settings.md
| It("Should NOT create a Database object", func() { | ||
| Consistently(func() error { | ||
| return LoadResource("", core.GetObjectName(stack.Name, "ledger"), &v1beta1.Database{}) | ||
| }).ShouldNot(Succeed()) | ||
| }) |
There was a problem hiding this comment.
Wait for a positive v3 signal before asserting Database absence.
This can pass before the controller has finished its first reconcile. First wait for the StatefulSet or headless Service to exist, then keep asserting that the Database object stays absent for a short window.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/tests/ledger_v3_controller_test.go` around lines 171 - 175, Modify
the "Should NOT create a Database object" test to first wait for a positive v3
signal (e.g., assert existence of the StatefulSet or the headless Service
created by the controller) before asserting Database absence: replace the
immediate Consistently call that uses LoadResource("",
core.GetObjectName(stack.Name, "ledger"), &v1beta1.Database{}) with a two-step
approach—1) Block until the StatefulSet or headless Service is observed (use the
test helper that GETs the StatefulSet/Service by name and expects it to
Eventually(Succeed)), then 2) run the Consistently loop that calls
LoadResource(..., &v1beta1.Database{}) and ShouldNot(Succeed()) to ensure the
Database remains absent for the short window; keep references to LoadResource,
core.GetObjectName(stack.Name, "ledger"), v1beta1.Database and the test "Should
NOT create a Database object" so the change is localized.
On scale-down, the preStop lifecycle hook: 1. Calls POST /_admin/deregister to remove the node from the Raft cluster 2. Cleans the WAL directory so future re-joins start as fresh learners This ensures clean Raft membership management during StatefulSet scaling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
internal/resources/ledgers/v3.go (3)
323-342:⚠️ Potential issue | 🟠 MajorReplace
resource.MustParsefor setting-derived PVC sizes.
sizeStrcomes from Settings, so invalid values can panic the reconciler here. Useresource.ParseQuantityand return a validation error instead.Safer quantity parsing
for _, s := range specs { sizeStr, err := settings.GetStringOrDefault(ctx, stackName, s.defaultSize, strings.Split(s.sizeKey, ".")...) if err != nil { return nil, err } + sizeQty, err := resource.ParseQuantity(sizeStr) + if err != nil { + return nil, fmt.Errorf("invalid %s value %q: %w", s.sizeKey, sizeStr, err) + } @@ Resources: corev1.VolumeResourceRequirements{ Requests: corev1.ResourceList{ - corev1.ResourceStorage: resource.MustParse(sizeStr), + corev1.ResourceStorage: sizeQty, }, }, }, }#!/bin/bash set -euo pipefail # Show setting-derived size flow and parse call site rg -n 'GetStringOrDefault\(.*persistence.*size|MustParse\(|ParseQuantity\(' internal/resources/ledgers/v3.go -C3🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/resources/ledgers/v3.go` around lines 323 - 342, Replace the call to resource.MustParse(sizeStr) inside the PVC creation (the code around sizeStr and the corev1.PersistentVolumeClaimSpec in v3.go) with safe parsing: call resource.ParseQuantity(sizeStr), handle the returned (q, err) and if err != nil return a validation error (wrap or return err) instead of panicking; ensure the parsed Quantity is used to populate the Requests map for corev1.ResourceStorage so reconciler never panics on invalid setting-derived sizes.
269-273:⚠️ Potential issue | 🟠 MajorDo not hardcode
cluster.localin peer addresses.Using fixed
.svc.cluster.localbreaks discovery on clusters with a custom DNS domain. Prefer namespace-local.svcaddresses or make cluster domain configurable.Safer address construction
- fmt.Sprintf(`ADVERTISE_ADDR="${POD_NAME}.%s.${POD_NAMESPACE}.svc.cluster.local:%d"`, headlessSvc, v3PortRaft), + fmt.Sprintf(`ADVERTISE_ADDR="${POD_NAME}.%s.${POD_NAMESPACE}.svc:%d"`, headlessSvc, v3PortRaft), ... - fmt.Sprintf(`BOOTSTRAP_ADDR="ledger-0.%s.${POD_NAMESPACE}.svc.cluster.local:%d"`, headlessSvc, v3PortGRPC), + fmt.Sprintf(`BOOTSTRAP_ADDR="ledger-0.%s.${POD_NAMESPACE}.svc:%d"`, headlessSvc, v3PortGRPC),#!/bin/bash set -euo pipefail rg -n 'cluster\.local' internal/resources/ledgers/v3.go -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/resources/ledgers/v3.go` around lines 269 - 273, The peer address strings currently hardcode ".svc.cluster.local" when building ADVERTISE_ADDR and BOOTSTRAP_ADDR; change them to avoid the fixed cluster domain by either using the namespace-local ".svc" suffix or reading a configurable CLUSTER_DOMAIN env var. Update the fmt.Sprintf calls that build ADVERTISE_ADDR and BOOTSTRAP_ADDR (the lines that reference headlessSvc, v3PortRaft and v3PortGRPC) to produce "%s.${POD_NAMESPACE}.svc:%d" or "%s.${POD_NAMESPACE}.svc.%s:%d" driven by CLUSTER_DOMAIN, and ensure any new env var is read early so the same variable is used for both ADVERTISE_ADDR and BOOTSTRAP_ADDR.
111-127:⚠️ Potential issue | 🟠 MajorAvoid reconciling
volumeClaimTemplatesafter StatefulSet creation.
VolumeClaimTemplatesis effectively immutable for normal StatefulSet updates. Rebuilding it from Settings in this mutate block can cause forbidden updates once the StatefulSet exists.#!/bin/bash set -euo pipefail # Verify StatefulSet mutate path is always assigning VolumeClaimTemplates rg -n 'CreateOrUpdate\[\*appsv1.StatefulSet\]|VolumeClaimTemplates|buildV3VolumeClaimTemplates' internal/resources/ledgers/v3.go -C3 # Verify persistence settings are consumed by v3 reconciler rg -n 'module\.ledger\.v3\.persistence|buildV3VolumeClaimTemplates' internal/resources/ledgers/v3.go -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/resources/ledgers/v3.go` around lines 111 - 127, The mutate block currently unconditionally overwrites t.Spec.VolumeClaimTemplates causing forbidden updates; modify the closure passed to CreateOrUpdate[*appsv1.StatefulSet] so that you only assign VolumeClaimTemplates when the StatefulSet is being created or has no existing templates (e.g., check t.ObjectMeta.CreationTimestamp.IsZero() or len(t.Spec.VolumeClaimTemplates) == 0) and otherwise leave t.Spec.VolumeClaimTemplates untouched; continue to use buildV3VolumeClaimTemplates to construct the templates but only set them conditionally inside the function literal for t *appsv1.StatefulSet.internal/tests/ledger_v3_controller_test.go (1)
185-189:⚠️ Potential issue | 🟡 MinorWait for a positive v3 signal before asserting
Databaseabsence.This assertion can pass before reconciliation starts. First wait for
StatefulSet/headless Service creation, then runConsistentlyonDatabaseabsence.Suggested test stabilization
It("Should NOT create a Database object", func() { + sts := &appsv1.StatefulSet{} + Eventually(func() error { + return LoadResource(stack.Name, "ledger", sts) + }).Should(Succeed()) + Consistently(func() error { return LoadResource("", core.GetObjectName(stack.Name, "ledger"), &v1beta1.Database{}) }).ShouldNot(Succeed()) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/tests/ledger_v3_controller_test.go` around lines 185 - 189, The test currently asserts Database absence immediately which can race with reconciliation; modify the test ("Should NOT create a Database object") to first wait for the positive v3 signal by polling for the created StatefulSet and headless Service (e.g., use LoadResource/Expect/Eventually against core.GetObjectName(stack.Name, "<statefulset-name>") and the headless Service) and only after those resources exist run the Consistently check calling LoadResource(..., &v1beta1.Database{}) to assert it does not appear; keep the same resource-name construction via core.GetObjectName and the existing Consistently block but gate it behind an initial Eventually/Wait for the StatefulSet and headless Service to stabilize.
🧹 Nitpick comments (1)
internal/tests/ledger_v3_controller_test.go (1)
200-218: Add a negative-path test for even replica settings.Current coverage proves custom odd replicas, but not enforcement failure for even values. Add a
module.ledger.v3.replicas=4case and assert reconcile does not produce the v3StatefulSet.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/tests/ledger_v3_controller_test.go` around lines 200 - 218, Add a negative-path test that ensures even replica values are rejected: create a new Context similar to the existing "with custom replicas setting" but set replicasSetting = settings.New(uuid.NewString(), "module.ledger.v3.replicas", "4", stack.Name), call Create(replicasSetting) in JustBeforeEach and Delete(replicasSetting) in AfterEach, then assert reconcile does not produce the v3 StatefulSet by calling LoadResource(stack.Name, "ledger", sts) and expecting it to return an error or for Eventually(func(g Gomega) error { return LoadResource(stack.Name, "ledger", sts) }).ShouldNot(Succeed()) (or equivalent negative assertion) so the test verifies no StatefulSet is created for an even replica value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/resources/ledgers/v3.go`:
- Around line 94-96: The current validation only enforces oddness and allows
non-positive values like -1; update the guard around the replicas variable to
first check replicas > 0 and return a clear error (e.g.,
"module.ledger.v3.replicas must be positive, got %d") before checking oddness,
so the function handling replica validation (the block using the replicas
variable) fails fast with an explicit positive-count error and then retains the
existing oddness check.
---
Duplicate comments:
In `@internal/resources/ledgers/v3.go`:
- Around line 323-342: Replace the call to resource.MustParse(sizeStr) inside
the PVC creation (the code around sizeStr and the
corev1.PersistentVolumeClaimSpec in v3.go) with safe parsing: call
resource.ParseQuantity(sizeStr), handle the returned (q, err) and if err != nil
return a validation error (wrap or return err) instead of panicking; ensure the
parsed Quantity is used to populate the Requests map for corev1.ResourceStorage
so reconciler never panics on invalid setting-derived sizes.
- Around line 269-273: The peer address strings currently hardcode
".svc.cluster.local" when building ADVERTISE_ADDR and BOOTSTRAP_ADDR; change
them to avoid the fixed cluster domain by either using the namespace-local
".svc" suffix or reading a configurable CLUSTER_DOMAIN env var. Update the
fmt.Sprintf calls that build ADVERTISE_ADDR and BOOTSTRAP_ADDR (the lines that
reference headlessSvc, v3PortRaft and v3PortGRPC) to produce
"%s.${POD_NAMESPACE}.svc:%d" or "%s.${POD_NAMESPACE}.svc.%s:%d" driven by
CLUSTER_DOMAIN, and ensure any new env var is read early so the same variable is
used for both ADVERTISE_ADDR and BOOTSTRAP_ADDR.
- Around line 111-127: The mutate block currently unconditionally overwrites
t.Spec.VolumeClaimTemplates causing forbidden updates; modify the closure passed
to CreateOrUpdate[*appsv1.StatefulSet] so that you only assign
VolumeClaimTemplates when the StatefulSet is being created or has no existing
templates (e.g., check t.ObjectMeta.CreationTimestamp.IsZero() or
len(t.Spec.VolumeClaimTemplates) == 0) and otherwise leave
t.Spec.VolumeClaimTemplates untouched; continue to use
buildV3VolumeClaimTemplates to construct the templates but only set them
conditionally inside the function literal for t *appsv1.StatefulSet.
In `@internal/tests/ledger_v3_controller_test.go`:
- Around line 185-189: The test currently asserts Database absence immediately
which can race with reconciliation; modify the test ("Should NOT create a
Database object") to first wait for the positive v3 signal by polling for the
created StatefulSet and headless Service (e.g., use
LoadResource/Expect/Eventually against core.GetObjectName(stack.Name,
"<statefulset-name>") and the headless Service) and only after those resources
exist run the Consistently check calling LoadResource(..., &v1beta1.Database{})
to assert it does not appear; keep the same resource-name construction via
core.GetObjectName and the existing Consistently block but gate it behind an
initial Eventually/Wait for the StatefulSet and headless Service to stabilize.
---
Nitpick comments:
In `@internal/tests/ledger_v3_controller_test.go`:
- Around line 200-218: Add a negative-path test that ensures even replica values
are rejected: create a new Context similar to the existing "with custom replicas
setting" but set replicasSetting = settings.New(uuid.NewString(),
"module.ledger.v3.replicas", "4", stack.Name), call Create(replicasSetting) in
JustBeforeEach and Delete(replicasSetting) in AfterEach, then assert reconcile
does not produce the v3 StatefulSet by calling LoadResource(stack.Name,
"ledger", sts) and expecting it to return an error or for Eventually(func(g
Gomega) error { return LoadResource(stack.Name, "ledger", sts)
}).ShouldNot(Succeed()) (or equivalent negative assertion) so the test verifies
no StatefulSet is created for an even replica value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0ba146dd-79d7-439f-ad46-58146905853a
📒 Files selected for processing (2)
internal/resources/ledgers/v3.gointernal/tests/ledger_v3_controller_test.go
| if replicas%2 == 0 { | ||
| return fmt.Errorf("module.ledger.v3.replicas must be odd, got %d", replicas) | ||
| } |
There was a problem hiding this comment.
Reject non-positive replica counts explicitly.
Odd-only validation allows invalid negative values (e.g., -1). Enforce replicas > 0 together with oddness to fail fast with a clear message.
Proposed guard
- if replicas%2 == 0 {
- return fmt.Errorf("module.ledger.v3.replicas must be odd, got %d", replicas)
+ if replicas <= 0 || replicas%2 == 0 {
+ return fmt.Errorf("module.ledger.v3.replicas must be a positive odd number, got %d", replicas)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if replicas%2 == 0 { | |
| return fmt.Errorf("module.ledger.v3.replicas must be odd, got %d", replicas) | |
| } | |
| if replicas <= 0 || replicas%2 == 0 { | |
| return fmt.Errorf("module.ledger.v3.replicas must be a positive odd number, got %d", replicas) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/resources/ledgers/v3.go` around lines 94 - 96, The current
validation only enforces oddness and allows non-positive values like -1; update
the guard around the replicas variable to first check replicas > 0 and return a
clear error (e.g., "module.ledger.v3.replicas must be positive, got %d") before
checking oddness, so the function handling replica validation (the block using
the replicas variable) fails fast with an explicit positive-count error and then
retains the existing oddness check.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
SettingsCRD — no changes to theLedgerCRD typeWhat's new
StatefulSet/ledgerOrderedReadypolicyService/ledger-raft(headless)Service/ledger(ClusterIP)wal(5Gi),data(10Gi),cold-cache(10Gi)Settings keys
ledger.v3.replicasledger.v3.cluster-idledger.v3.persistence.{wal,data,cold-cache}.sizeledger.v3.persistence.{wal,data,cold-cache}.storage-classledger.v3.pebble.*ledger.v3.raft.*Version gate
v2 code path is completely untouched.
Test plan
go build ./...andgo vet ./...passLedgerCR withversion: v3.0.0in a Stack, verify StatefulSet, headless Service, ClusterIP Service, and PVCs are created🤖 Generated with Claude Code