caching: gate runtime vlog maintenance by sync phase#845
caching: gate runtime vlog maintenance by sync phase#845snissn wants to merge 16 commits intopr/vlog-maint-v2-runtime-debtfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an explicit maintenance “phase” concept (restore/catchup/steady) and uses it to gate background value-log (vlog) generation maintenance, with a trace bridge so existing treedbtrace phase events can drive TreeDB’s maintenance phase during real-node runs.
Changes:
- Add
MaintenancePhaseAPI to TreeDB (re-exported from caching DB) and expose it in stats. - Bridge
treedbtracephases intotreedb.DB.SetMaintenancePhase()via a global phase bus tied to wrapped DB lifetimes. - Gate periodic + checkpoint-kick vlog generation maintenance by maintenance phase, and expand rewrite planning/state to support staged confirmation and age-blocked reporting.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| kvstore/adapters/treedbtrace/treedbtrace.go | Registers/unregisters wrapped *treedb.DB with a phase bus so trace phase changes propagate into maintenance phase. |
| kvstore/adapters/treedbtrace/trace.go | Adds the phase bus, phase normalization/mapping, and CurrentTracePhase(). |
| kvstore/adapters/treedbtrace/phase_bridge_test.go | Verifies trace-phase propagation, apply-on-wrap, and unregister-on-close behavior. |
| TreeDB/public.go | Re-exports MaintenancePhase and adds DB.MaintenancePhase() / DB.SetMaintenancePhase(). |
| TreeDB/maintenance_phase_test.go | Confirms TreeDB forwards maintenance phase to cached DB and exposes it via stats. |
| TreeDB/db/vlog_rewrite.go | Adds age-blocked selection stats to rewrite plans and refactors sparse source selection / live-estimate decisioning. |
| TreeDB/caching/db.go | Implements phase state, stats export, phase-gated periodic maintenance, and phase-gated checkpoint kick. |
| TreeDB/caching/vlog_generation_state.go | Persists staged-rewrite confirmation state and adds helper paths for staging/restaging. |
| TreeDB/caching/vlog_generation_state_test.go | Updates loader tests for new stage fields returned by the loader. |
| TreeDB/caching/vlog_generation_scheduler_test.go | Updates WAL-on checkpoint-kick expectations and adds tests for phase-gated maintenance behavior. |
| TreeDB/caching/vlog_generation_checkpoint_kick_walon_test.go | Adds a focused WAL-on checkpoint-kick skip test in caching package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (db *DB) maybeRunPeriodicVlogGenerationMaintenance(runGC bool) bool { | ||
| if db == nil { | ||
| return false | ||
| } | ||
| if db.suppressBackgroundVlogGenerationForMaintenancePhase() { | ||
| db.debugVlogMaintf("periodic_skip reason=maintenance_phase phase=%s run_gc=%t", maintenancePhaseString(uint32(db.MaintenancePhase())), runGC) | ||
| return false | ||
| } | ||
| db.maybeRunVlogGenerationMaintenance(runGC) | ||
| return true |
There was a problem hiding this comment.
Maintenance-phase gating is currently applied to the periodic loop and checkpoint-kick, but deferred/background paths (e.g., stage-confirm/age-blocked wakes via startVlogGenerationDeferredMaintenance and scheduleVlogGenerationRewriteStageConfirmation) still call maybeRunVlogGenerationMaintenanceWithOptions without checking MaintenancePhase(). That means restore/catchup can still run generation maintenance despite the new phase gate. Consider moving the phase check into maybeRunVlogGenerationMaintenanceWithOptions (or into the deferred wake/scheduler entrypoints) so all background triggers are consistently suppressed when phase != steady.
| time.Sleep(2 * schedulerTestWait(t)) | ||
| _, rewriteCalls := recorder.recordedRewrite() | ||
| _, planCalls := recorder.recordedPlan() | ||
| if rewriteCalls != 0 || planCalls != 0 { | ||
| t.Fatalf("checkpoint kick unexpectedly ran under WAL-on: planCalls=%d rewriteCalls=%d", planCalls, rewriteCalls) |
There was a problem hiding this comment.
This test uses a fixed sleep to assert that no checkpoint-kick ran. Negative assertions via time.Sleep can be flaky and can miss late-arriving goroutines. If possible, prefer checking a deterministic signal (e.g., vlogGenerationCheckpointKickRuns/Active remain 0, or have the recorder expose a channel that would fire on any plan/rewrite call and select with a timeout).
| time.Sleep(2 * schedulerTestWait(t)) | |
| _, rewriteCalls := recorder.recordedRewrite() | |
| _, planCalls := recorder.recordedPlan() | |
| if rewriteCalls != 0 || planCalls != 0 { | |
| t.Fatalf("checkpoint kick unexpectedly ran under WAL-on: planCalls=%d rewriteCalls=%d", planCalls, rewriteCalls) | |
| // Monitor for any unexpected plan/rewrite activity after the checkpoint. | |
| done := make(chan struct{}) | |
| go func() { | |
| for { | |
| _, rewriteCalls := recorder.recordedRewrite() | |
| _, planCalls := recorder.recordedPlan() | |
| if rewriteCalls != 0 || planCalls != 0 { | |
| close(done) | |
| return | |
| } | |
| time.Sleep(10 * time.Millisecond) | |
| } | |
| }() | |
| select { | |
| case <-done: | |
| // Re-read to report current counts. | |
| _, rewriteCalls := recorder.recordedRewrite() | |
| _, planCalls := recorder.recordedPlan() | |
| t.Fatalf("checkpoint kick unexpectedly ran under WAL-on: planCalls=%d rewriteCalls=%d", planCalls, rewriteCalls) | |
| case <-time.After(2 * schedulerTestWait(t)): | |
| // Success: no checkpoint kick occurred within the wait window. |
| err := d.inner.Close() | ||
| if d.phase != nil { | ||
| phaseBus.unregister(d.phase) | ||
| } | ||
| if t := getTrace(); t != nil { | ||
| t.closeDB() | ||
| } |
There was a problem hiding this comment.
In Close(), the adapter unregisters from phaseBus only after calling d.inner.Close(). That leaves a window where SetTracePhase can still call SetMaintenancePhase on a DB that is closing/closed (and if Close() returns an error but leaves the DB usable, it gets unregistered anyway). Consider unregistering from phaseBus before closing the inner DB (and optionally re-registering on close failure if that’s meaningful).
| err := d.inner.Close() | |
| if d.phase != nil { | |
| phaseBus.unregister(d.phase) | |
| } | |
| if t := getTrace(); t != nil { | |
| t.closeDB() | |
| } | |
| if d.phase != nil { | |
| phaseBus.unregister(d.phase) | |
| } | |
| if t := getTrace(); t != nil { | |
| t.closeDB() | |
| } | |
| err := d.inner.Close() |
TreeDB/db/vlog_rewrite.go
Outdated
|
|
||
| func selectRewriteSourceSegments(opts ValueLogRewriteOnlineOptions, files map[uint32]*valuelog.File, active map[uint32]struct{}, liveByID map[uint32]int64) map[uint32]struct{} { | ||
| if len(opts.SourceFileIDs) > 0 { | ||
| func selectRewriteSourceSegments(opts ValueLogRewriteOnlineOptions, files map[uint32]*valuelog.File, active map[uint32]struct{}, liveByID map[uint32]int64) (map[uint32]struct{}, rewriteSourceSelectionStats) { |
There was a problem hiding this comment.
selectRewriteSourceSegments now returns two values, but there are existing call sites in the repo (including TreeDB/db/vlog_rewrite_test.go) that still expect a single return value. This will break compilation unless all callers are updated to handle the added rewriteSourceSelectionStats result (even if they ignore it).
| func selectRewriteSourceSegments(opts ValueLogRewriteOnlineOptions, files map[uint32]*valuelog.File, active map[uint32]struct{}, liveByID map[uint32]int64) (map[uint32]struct{}, rewriteSourceSelectionStats) { | |
| func selectRewriteSourceSegments(opts ValueLogRewriteOnlineOptions, files map[uint32]*valuelog.File, active map[uint32]struct{}, liveByID map[uint32]int64) map[uint32]struct{} { | |
| selected, _ := selectRewriteSourceSegmentsWithStats(opts, files, active, liveByID) | |
| return selected | |
| } | |
| func selectRewriteSourceSegmentsWithStats(opts ValueLogRewriteOnlineOptions, files map[uint32]*valuelog.File, active map[uint32]struct{}, liveByID map[uint32]int64) (map[uint32]struct{}, rewriteSourceSelectionStats) { |
TreeDB/db/vlog_rewrite.go
Outdated
| if len(opts.SourceFileIDs) > 0 && liveByID == nil { | ||
| selected := make(map[uint32]struct{}, len(opts.SourceFileIDs)) | ||
| for _, id := range opts.SourceFileIDs { | ||
| if _, ok := files[id]; !ok { | ||
| continue | ||
| } | ||
| selected[id] = struct{}{} | ||
| } | ||
| return selected | ||
| return selected, stats |
There was a problem hiding this comment.
When SourceFileIDs are provided and liveByID is nil, this early return ignores the provided active set (and also bypasses the protectedIDs/minSegmentAge filtering below). That makes it possible for explicitly requested segments to include the currently-active segment(s), which the rest of the selection logic explicitly excludes. Suggest applying the same active/protected/min-age checks for SourceFileIDs selection, and removing the later (currently unreachable) in-loop branch that checks the same condition.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b67d6d8cdc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if db.suppressBackgroundVlogGenerationForMaintenancePhase() { | ||
| db.debugVlogMaintf("checkpoint_kick_skip reason=maintenance_phase phase=%s", maintenancePhaseString(uint32(db.MaintenancePhase()))) | ||
| return |
There was a problem hiding this comment.
Clear pending checkpoint retries in non-steady phases
This early return skips new checkpoint kicks during restore/catchup, but it leaves any previously-set vlogGenerationCheckpointKickPending bit untouched. If that bit was set before the phase transition (for example via a maintenance collision), schedulePendingVlogGenerationCheckpointKick can still launch retry maintenance passes in non-steady phases, so background rewrite/GC work still runs while the phase gate is supposed to suppress it.
Useful? React with 👍 / 👎.
| if len(opts.SourceFileIDs) == 0 { | ||
| return true | ||
| } | ||
| return opts.MinSegmentStaleRatio > 0 || opts.MinSegmentStaleBytes > 0 |
There was a problem hiding this comment.
Keep plan and execution filters consistent for explicit sources
ValueLogRewritePlan now treats SourceFileIDs + MinSegmentStale* as requiring live-byte estimation, so planning can filter out explicit IDs by stale thresholds. ValueLogRewriteOnline still does not estimate live bytes when SourceFileIDs is non-empty, so execution can still rewrite all explicit IDs. This creates plan/execute divergence for the same options and can cause rewrites that the plan said would be skipped.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f4c306770
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for _, id := range opts.SourceFileIDs { | ||
| if _, ok := files[id]; !ok { | ||
| continue | ||
| } | ||
| candidateFileIDs = append(candidateFileIDs, id) |
There was a problem hiding this comment.
Deduplicate explicit source IDs before budgeted selection
The new explicit-source branch now appends every SourceFileIDs entry directly into candidateFileIDs and runs through the shared byte-budget loop, so duplicate IDs are processed multiple times. Because selection accounting adds candidate.liveBytes per candidate, duplicates can consume MaxSourceBytes repeatedly and cause later distinct IDs to be skipped (for example, [1,1,2] can select only {1} under a tight byte cap). This regresses explicit-ID rewrites by silently omitting requested segments when duplicates are present.
Useful? React with 👍 / 👎.
| b.mu.Lock() | ||
| b.dbs[db]++ | ||
| b.mu.Unlock() | ||
| db.SetMaintenancePhase(maintenancePhaseForTrace(b.current())) | ||
| } |
There was a problem hiding this comment.
Serialize phase registration with phase updates
register records the DB in b.dbs, unlocks, and only then applies b.current() to the DB. If SetTracePhase runs in that window, set() can already push the new phase to this DB and register can immediately overwrite it with a stale phase value. In that race, a freshly wrapped DB may stay in steady during restore/catchup until another phase change arrives, which defeats the maintenance-phase gate for that instance.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| minStaleBytes := opts.MinSegmentStaleBytes | ||
| maxSourceSegments := opts.MaxSourceSegments | ||
| maxSourceBytes := opts.MaxSourceBytes | ||
| minSegmentAge := opts.MinSegmentAge |
There was a problem hiding this comment.
In selectRewriteSourceSegmentsWithStats, MaxSourceSegments/MaxSourceBytes are applied even when opts.SourceFileIDs is non-empty (explicit source selection). The option docs above state these caps apply only when SourceFileIDs is empty, so this changes API semantics and can unexpectedly drop explicitly requested segments. Consider ignoring maxSourceSegments/maxSourceBytes when explicitSources is true (or update the option documentation and call sites accordingly).
| minSegmentAge := opts.MinSegmentAge | |
| minSegmentAge := opts.MinSegmentAge | |
| explicitSources := len(opts.SourceFileIDs) > 0 | |
| if explicitSources { | |
| // When explicit source file IDs are provided, ignore global caps so that | |
| // we do not drop any explicitly requested segments. The option | |
| // documentation states that MaxSourceSegments/MaxSourceBytes only apply | |
| // when SourceFileIDs is empty. | |
| maxSourceSegments = 0 | |
| maxSourceBytes = 0 | |
| } |
| if minSegmentAge > 0 && f.Path != "" { | ||
| if info, err := os.Stat(f.Path); err == nil { | ||
| if age := now.Sub(info.ModTime()); age < minSegmentAge { | ||
| liveBytes := liveByID[id] | ||
| if liveBytes < 0 { | ||
| liveBytes = 0 | ||
| } | ||
| if liveBytes > size { | ||
| liveBytes = size | ||
| } | ||
| staleBytes := size - liveBytes | ||
| stats.ageBlockedSegments++ | ||
| stats.ageBlockedBytesTotal += size | ||
| stats.ageBlockedBytesLive += liveBytes | ||
| stats.ageBlockedBytesStale += staleBytes |
There was a problem hiding this comment.
AgeBlockedBytesLive/AgeBlockedBytesStale are computed from liveByID even when liveByID is nil (no live-byte estimation). In that case liveByID[id] reads as 0, which will record the entire segment as stale and can make the plan’s AgeBlocked* stats misleading. Consider treating live bytes as unknown here (e.g., skip live/stale accounting) or default to liveBytes=size when no estimate is available, consistent with the explicitSources/liveByID==nil candidate path below.
There was a problem hiding this comment.
💡 Codex Review
gomap/TreeDB/db/vlog_rewrite.go
Lines 988 to 991 in 6066640
ValueLogRewriteOnlineOptions documents that MaxSourceSegments/MaxSourceBytes apply only when SourceFileIDs is empty, but after this refactor explicit IDs now flow through the shared selection loop and hit these cap checks. That means callers who provide SourceFileIDs (for a precise rewrite set) can now silently lose requested segments when a non-zero cap is present, e.g. from a reused options struct, producing incomplete rewrites that contradict the API contract.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if minSegmentAge > 0 && f.Path != "" { | ||
| if info, err := os.Stat(f.Path); err == nil { | ||
| if age := now.Sub(info.ModTime()); age < minSegmentAge { | ||
| liveBytes := liveByID[id] | ||
| if liveBytes < 0 { | ||
| liveBytes = 0 | ||
| } | ||
| if liveBytes > size { | ||
| liveBytes = size | ||
| } | ||
| staleBytes := size - liveBytes | ||
| stats.ageBlockedSegments++ | ||
| stats.ageBlockedBytesTotal += size | ||
| stats.ageBlockedBytesLive += liveBytes | ||
| stats.ageBlockedBytesStale += staleBytes | ||
| remaining := minSegmentAge - age |
There was a problem hiding this comment.
When MinSegmentAge excludes a segment, the AgeBlockedBytesLive/BytesStale stats are computed from liveByID[id]. If liveByID is nil (e.g., explicit SourceFileIDs without stale filters), the nil-map lookup returns 0, so age-blocked segments are reported as 0 live / 100% stale. Consider only populating the live/stale breakdown when liveByID != nil, or treating liveBytes as size (unknown) when no estimate is available, so the plan stats don’t mislead callers.
TreeDB/caching/db.go
Outdated
| db.vlogGenerationCheckpointKickPending.Store(false) | ||
| if opts.debugSource != "" { | ||
| db.debugVlogMaintf( | ||
| "maintenance_skip reason=maintenance_phase source=%s phase=%s checkpoint_pending=%t deferred_pending=%t", | ||
| opts.debugSource, | ||
| maintenancePhaseString(uint32(db.MaintenancePhase())), | ||
| db.vlogGenerationCheckpointKickPending.Load(), |
There was a problem hiding this comment.
In the maintenance-phase suppression branch, checkpoint_pending is cleared before logging, so the debug line will always print checkpoint_pending=false even if it was true on entry. Capture the previous pending value before clearing (or log that it was cleared) to keep the debug output accurate for diagnosing why a kick was dropped.
| db.vlogGenerationCheckpointKickPending.Store(false) | |
| if opts.debugSource != "" { | |
| db.debugVlogMaintf( | |
| "maintenance_skip reason=maintenance_phase source=%s phase=%s checkpoint_pending=%t deferred_pending=%t", | |
| opts.debugSource, | |
| maintenancePhaseString(uint32(db.MaintenancePhase())), | |
| db.vlogGenerationCheckpointKickPending.Load(), | |
| prevCheckpointPending := db.vlogGenerationCheckpointKickPending.Swap(false) | |
| if opts.debugSource != "" { | |
| db.debugVlogMaintf( | |
| "maintenance_skip reason=maintenance_phase source=%s phase=%s checkpoint_pending=%t deferred_pending=%t", | |
| opts.debugSource, | |
| maintenancePhaseString(uint32(db.MaintenancePhase())), | |
| prevCheckpointPending, |
| time.Sleep(2 * schedulerTestWait(t)) | ||
| _, rewriteCalls := recorder.recordedRewrite() | ||
| _, planCalls := recorder.recordedPlan() | ||
| if rewriteCalls != 0 || planCalls != 0 { | ||
| t.Fatalf("checkpoint kick unexpectedly ran under WAL-on: planCalls=%d rewriteCalls=%d", planCalls, rewriteCalls) | ||
| } |
There was a problem hiding this comment.
This test now uses a fixed sleep to assert the checkpoint kick doesn’t run under WAL-on. Since the WAL-on guard should prevent scheduling the kick at all, you can make this faster and less timing-dependent by asserting kick state/counters (e.g., vlogGenerationCheckpointKickRuns/Active/Pending and recorder calls) immediately after Checkpoint, instead of sleeping.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
TreeDB/caching/vlog_generation_state.go:184
- The PR adds persisted stage fields (
RewriteStagePending/RewriteStageObserved), but the updated tests only assert the default zero-values on load. Adding a round-trip test that callssaveValueLogGenerationRewriteState(..., stagePending=true, stageObservedAt!=0)and then reloads to assert the fields persist would better protect this new behavior from regressions.
func saveValueLogGenerationRewriteState(path string, ids []uint32, ledger []backenddb.ValueLogRewritePlanSegment, penalties map[uint32]valueLogGenerationRewritePenalty, stagePending bool, stageObservedAt int64) error {
raw, err := loadValueLogGenerationState(path)
if err != nil {
return err
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| inner kvstore.DB | ||
| phase *treedb.DB |
There was a problem hiding this comment.
The new field name phase is misleading because it holds a *treedb.DB (not the current phase value). Renaming it to something like phaseDB, treedbDB, or maintenanceDB would make the lifecycle/register/unregister intent much clearer.
| if db.suppressBackgroundVlogGenerationForMaintenancePhase() { | ||
| db.vlogGenerationCheckpointKickPending.Store(false) | ||
| if opts.debugSource != "" { | ||
| db.debugVlogMaintf( | ||
| "maintenance_skip reason=maintenance_phase source=%s phase=%s checkpoint_pending=%t deferred_pending=%t", | ||
| opts.debugSource, | ||
| maintenancePhaseString(uint32(db.MaintenancePhase())), | ||
| db.vlogGenerationCheckpointKickPending.Load(), | ||
| db.vlogGenerationDeferredMaintenancePending.Load(), | ||
| ) | ||
| } | ||
| return | ||
| } |
There was a problem hiding this comment.
This debug log always prints checkpoint_pending=false because it loads the value after forcibly clearing it. If the intent is to help diagnose why work was deferred/skipped, capture the prior pending value before Store(false) and log that instead.
| raw.RewriteStagePending = stagePending | ||
| raw.RewriteStageObserved = stageObservedAt | ||
| return saveValueLogGenerationState(path, raw) | ||
| } |
There was a problem hiding this comment.
The PR adds persisted stage fields (RewriteStagePending / RewriteStageObserved), but the updated tests only assert the default zero-values on load. Adding a round-trip test that calls saveValueLogGenerationRewriteState(..., stagePending=true, stageObservedAt!=0) and then reloads to assert the fields persist would better protect this new behavior from regressions.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0bf23e5b92
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if len(opts.SourceFileIDs) == 0 { | ||
| return opts.MinSegmentStaleRatio > 0 || opts.MinSegmentStaleBytes > 0 || opts.MaxSourceSegments > 0 || opts.MaxSourceBytes > 0 | ||
| } |
There was a problem hiding this comment.
Require live-byte estimation for MinSegmentAge-only selection
When callers set only MinSegmentAge (no stale/byte caps and no explicit SourceFileIDs), this helper returns false, so ValueLogRewriteOnline/ValueLogRewritePlan skip estimateValueLogLiveBytesBySegment. In selectRewriteSourceSegmentsWithStats, non-explicit candidates then read liveByID[id] as zero and are all filtered out (liveBytes == 0), causing the rewrite pass to become a no-op instead of selecting old-enough segments. This regresses the new age-only sparse-selection mode and can silently disable maintenance for callers relying on MinSegmentAge as their only selector.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if age := now.Sub(info.ModTime()); age < minSegmentAge { | ||
| liveBytes := liveByID[id] | ||
| if liveBytes < 0 { | ||
| liveBytes = 0 | ||
| } | ||
| if liveBytes > size { | ||
| liveBytes = size | ||
| } | ||
| staleBytes := size - liveBytes | ||
| stats.ageBlockedSegments++ | ||
| stats.ageBlockedBytesTotal += size | ||
| stats.ageBlockedBytesLive += liveBytes | ||
| stats.ageBlockedBytesStale += staleBytes |
There was a problem hiding this comment.
When liveByID is nil (e.g., MinSegmentAge-only selection, or explicit SourceFileIDs without stale filters), liveByID[id] reads as 0 and makes the AgeBlocked live/stale stats look like the segment is 100% stale. If these plan stats are surfaced/used operationally, this will be misleading. Consider handling liveByID == nil explicitly here (e.g., record only total bytes, or pessimistically/optimistically set liveBytes=size), and document the chosen semantics.
TreeDB/caching/db.go
Outdated
| db.vlogGenerationCheckpointKickPending.Store(false) | ||
| if opts.debugSource != "" { | ||
| db.debugVlogMaintf( | ||
| "maintenance_skip reason=maintenance_phase source=%s phase=%s checkpoint_pending=%t deferred_pending=%t", | ||
| opts.debugSource, | ||
| maintenancePhaseString(uint32(db.MaintenancePhase())), | ||
| db.vlogGenerationCheckpointKickPending.Load(), | ||
| db.vlogGenerationDeferredMaintenancePending.Load(), | ||
| ) | ||
| } |
There was a problem hiding this comment.
This debug log always prints checkpoint_pending=false because the flag is cleared immediately before Load(). If the intent is to aid diagnosis, capture the prior value (and optionally deferred-pending) before clearing, log that, then apply the state change. That keeps logs accurate and makes phase gating easier to reason about.
| db.vlogGenerationCheckpointKickPending.Store(false) | |
| if opts.debugSource != "" { | |
| db.debugVlogMaintf( | |
| "maintenance_skip reason=maintenance_phase source=%s phase=%s checkpoint_pending=%t deferred_pending=%t", | |
| opts.debugSource, | |
| maintenancePhaseString(uint32(db.MaintenancePhase())), | |
| db.vlogGenerationCheckpointKickPending.Load(), | |
| db.vlogGenerationDeferredMaintenancePending.Load(), | |
| ) | |
| } | |
| // Capture prior pending flags for accurate diagnostics before clearing. | |
| priorCheckpointPending := db.vlogGenerationCheckpointKickPending.Load() | |
| priorDeferredPending := db.vlogGenerationDeferredMaintenancePending.Load() | |
| if opts.debugSource != "" { | |
| db.debugVlogMaintf( | |
| "maintenance_skip reason=maintenance_phase source=%s phase=%s checkpoint_pending=%t deferred_pending=%t", | |
| opts.debugSource, | |
| maintenancePhaseString(uint32(db.MaintenancePhase())), | |
| priorCheckpointPending, | |
| priorDeferredPending, | |
| ) | |
| } | |
| db.vlogGenerationCheckpointKickPending.Store(false) |
| time.Sleep(2 * schedulerTestWait(t)) | ||
| _, rewriteCalls := recorder.recordedRewrite() | ||
| _, planCalls := recorder.recordedPlan() | ||
| if rewriteCalls != 0 || planCalls != 0 { | ||
| t.Fatalf("checkpoint kick unexpectedly ran under WAL-on: planCalls=%d rewriteCalls=%d", planCalls, rewriteCalls) |
There was a problem hiding this comment.
Relying on a fixed Sleep to assert that something did not happen can be flaky and can also miss delayed async work (e.g., a kick firing just after the sleep). A more robust pattern here is to poll until a deadline and fail immediately if calls become non-zero, and/or assert the kick state flags (pending/active) are cleared in WAL-on mode to prove no delayed goroutine will fire.
| time.Sleep(2 * schedulerTestWait(t)) | |
| _, rewriteCalls := recorder.recordedRewrite() | |
| _, planCalls := recorder.recordedPlan() | |
| if rewriteCalls != 0 || planCalls != 0 { | |
| t.Fatalf("checkpoint kick unexpectedly ran under WAL-on: planCalls=%d rewriteCalls=%d", planCalls, rewriteCalls) | |
| deadline := time.Now().Add(2 * schedulerTestWait(t)) | |
| for time.Now().Before(deadline) { | |
| _, rewriteCalls := recorder.recordedRewrite() | |
| _, planCalls := recorder.recordedPlan() | |
| if rewriteCalls != 0 || planCalls != 0 { | |
| t.Fatalf("checkpoint kick unexpectedly ran under WAL-on: planCalls=%d rewriteCalls=%d", planCalls, rewriteCalls) | |
| } | |
| time.Sleep(schedulerTestWait(t) / 10) |
| const ( | ||
| keys = 50_000 | ||
| batchSize = 2_000 | ||
| ) |
There was a problem hiding this comment.
This test adds a fairly large workload (50k keys + churn) and runs it for two profiles, which can significantly increase CI wall time and/or cause timeouts in slower environments. Consider gating with testing.Short() (skip/reduce size when short), or lowering the key count while keeping the same invariants, so the regression signal remains without a large runtime cost.
| const ( | |
| keys = 50_000 | |
| batchSize = 2_000 | |
| ) | |
| keys := 50_000 | |
| batchSize := 2_000 | |
| if testing.Short() { | |
| keys = 5_000 | |
| } |
| type DB struct { | ||
| inner kvstore.DB | ||
| phase *treedb.DB | ||
| } |
There was a problem hiding this comment.
The new field name phase is ambiguous (it holds a *treedb.DB, not the phase value). Renaming to something like phaseDB, treedbDB, or maintenancePhaseDB would make it clearer what’s being registered/unregistered with the phase bus.
There was a problem hiding this comment.
💡 Codex Review
Lines 12521 to 12525 in 0efea40
When a stage-confirmation pass runs and ValueLogRewritePlan returns no SourceFileIDs, this branch only updates age-blocked state and leaves vlogGenerationRewriteStagePending/...ObservedUnixNano untouched. Because vlogGenerationRewriteStageConfirmDue() then remains true indefinitely, scheduleDueVlogGenerationDeferredMaintenance() keeps re-queuing rewrite_stage_confirm_exit immediately, so the deferred worker can repeatedly re-enter planning without waiting for new budget/data. This can create a tight maintenance retry loop (high CPU / repeated tree scans) whenever a previously staged ledger can no longer be confirmed.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| db.schedulePendingVlogGenerationCheckpointKick() | ||
| } else { | ||
| db.vlogGenerationCheckpointKickPending.Store(false) | ||
| } | ||
| } |
There was a problem hiding this comment.
Abort active maintenance when leaving steady phase
This phase transition handler only records the new phase and clears pending checkpoint kicks, but it does not stop an already-running maintenance pass. Since the non-steady phase guard is checked only before acquiring vlogGenerationMaintenanceActive, a rewrite/GC that started in steady will continue through restore/catchup if the phase flips mid-run. In that race, background maintenance still executes during phases that are meant to suppress it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a744c4049
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if !db.vlogGenerationRewriteAgeBlockedWakeRunning.CompareAndSwap(false, true) { | ||
| return |
There was a problem hiding this comment.
Re-arm age-block wake when deadline is shortened
This early return means a new setVlogGenerationRewriteAgeBlockedUntil call cannot reschedule the wake timer while a previous wake goroutine is already sleeping. If the new call sets an earlier deadline (for example after the candidate set changes and AgeBlockedMinRemainingAge drops), maintenance will still wait for the old, later timer before retrying, which can delay rewrite/GC far past the intended age gate and leave rewrite debt stalled longer than necessary.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if !db.vlogGenerationRewriteAgeBlockedWakeRunning.CompareAndSwap(false, true) { | ||
| return | ||
| } | ||
| db.wg.Add(1) | ||
| go func(expectedUntil int64, delay time.Duration) { | ||
| go func() { | ||
| defer db.wg.Done() | ||
| if delay > 0 { | ||
| timer := time.NewTimer(delay) | ||
| select { | ||
| case <-timer.C: | ||
| case <-db.closeCh: | ||
| if !timer.Stop() { | ||
| <-timer.C | ||
| defer db.vlogGenerationRewriteAgeBlockedWakeRunning.Store(false) | ||
| for !db.closing.Load() { | ||
| expectedUntil := db.vlogGenerationRewriteAgeBlockedUntilNS.Load() | ||
| if expectedUntil <= 0 { | ||
| return | ||
| } | ||
| deadline := time.Unix(0, expectedUntil) | ||
| delay := time.Until(deadline) | ||
| if delay > 0 { | ||
| timer := time.NewTimer(delay) | ||
| select { | ||
| case <-timer.C: | ||
| case <-db.closeCh: | ||
| if !timer.Stop() { | ||
| <-timer.C | ||
| } | ||
| return | ||
| } | ||
| timer.Stop() | ||
| } |
There was a problem hiding this comment.
The age-blocked wake goroutine isn’t notified when setVlogGenerationRewriteAgeBlockedUntil updates the deadline while a wake is already running. If the new deadline is earlier than the timer currently being waited on, the goroutine will still sleep until the old (later) timer fires, delaying the retry beyond the intended min-age boundary. Consider adding a wake-up signal/resettable timer (or allowing a new goroutine when the deadline is shortened) so earlier deadlines take effect immediately.
| b := db.NewBatch() | ||
| rng := rand.New(rand.NewSource(1)) | ||
| const uniqueKeys = adaptiveMinWrites + 1200 | ||
| value := make([]byte, 8<<10) // 8KiB | ||
| keys := make([][16]byte, uniqueKeys) | ||
|
|
||
| for i := 0; i < uniqueKeys; i++ { | ||
| var key [16]byte | ||
| binary.BigEndian.PutUint64(key[8:16], uint64(i)) | ||
| // First write introduces a new key; second write is an overwrite. | ||
| if err := db.Set(key[:], value); err != nil { | ||
| t.Fatalf("Set(first %d): %v", i, err) | ||
| binary.BigEndian.PutUint64(keys[i][0:8], rng.Uint64()) | ||
| binary.BigEndian.PutUint64(keys[i][8:16], uint64(i)) | ||
| if err := b.Set(keys[i][:], value); err != nil { | ||
| t.Fatalf("Batch.Set(first %d): %v", i, err) | ||
| } | ||
| if err := db.Set(key[:], value); err != nil { | ||
| t.Fatalf("Set(overwrite %d): %v", i, err) | ||
| } | ||
| for i := range uniqueKeys { | ||
| j := rng.Intn(uniqueKeys) | ||
| keys[i], keys[j] = keys[j], keys[i] | ||
| } | ||
| for i := 0; i < uniqueKeys; i++ { | ||
| if err := b.Set(keys[i][:], value); err != nil { | ||
| t.Fatalf("Batch.Set(overwrite %d): %v", i, err) | ||
| } | ||
| } | ||
| if err := b.Write(); err != nil { | ||
| t.Fatalf("Batch.Write: %v", err) | ||
| } |
There was a problem hiding this comment.
This test creates a caching.Batch via db.NewBatch() but never closes it. Batch.Close() returns pooled slices/arenas back to the DB; skipping it can inflate memory usage across the test suite and may affect later tests. Add a defer/explicit Close after Write (and handle its error) similar to other batch-using tests.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e2c713da02
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if db.suppressBackgroundVlogGenerationForMaintenancePhase() { | ||
| if opts.debugSource != "" { | ||
| db.debugVlogMaintf( | ||
| "maintenance_skip reason=maintenance_phase source=%s phase=%s checkpoint_pending=%t deferred_pending=%t", |
There was a problem hiding this comment.
Preserve checkpoint-pending retries when phase gate skips work
When maintenance is suppressed in restore/catchup, this early return drops out before any retry state is re-armed. In the checkpoint_pending path, schedulePendingVlogGenerationCheckpointKick has already flipped vlogGenerationCheckpointKickPending from true to false before launching retries, so a suppressed attempt here exits runVlogGenerationMaintenanceRetries with the pending bit cleared and no follow-up scheduled. If this happens while a phase transition is in progress, the queued checkpoint retry is effectively lost and won't run after returning to steady unless another checkpoint event re-queues it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (d *DB) Close() error { | ||
| err := d.inner.Close() | ||
| if d.phase != nil { | ||
| phaseBus.unregister(d.phase) | ||
| } |
There was a problem hiding this comment.
Close() unregisters the wrapped *treedb.DB from phaseBus only after calling d.inner.Close(). Since SetTracePhase() can concurrently iterate registered DBs and call db.SetMaintenancePhase(), this ordering allows the phase bridge to invoke SetMaintenancePhase during/after treedb.DB.Close() (which nils db.cached), creating a potential data race and post-close calls. Unregister the DB from phaseBus before closing the inner DB (ideally with a defer) so no further phase updates can target a closing instance.
| b := db.NewBatch() | ||
| rng := rand.New(rand.NewSource(1)) | ||
| const uniqueKeys = adaptiveMinWrites + 1200 | ||
| value := make([]byte, 8<<10) // 8KiB | ||
| keys := make([][16]byte, uniqueKeys) | ||
|
|
||
| for i := 0; i < uniqueKeys; i++ { | ||
| var key [16]byte | ||
| binary.BigEndian.PutUint64(key[8:16], uint64(i)) | ||
| // First write introduces a new key; second write is an overwrite. | ||
| if err := db.Set(key[:], value); err != nil { | ||
| t.Fatalf("Set(first %d): %v", i, err) | ||
| binary.BigEndian.PutUint64(keys[i][0:8], rng.Uint64()) | ||
| binary.BigEndian.PutUint64(keys[i][8:16], uint64(i)) | ||
| if err := b.Set(keys[i][:], value); err != nil { | ||
| t.Fatalf("Batch.Set(first %d): %v", i, err) | ||
| } | ||
| if err := db.Set(key[:], value); err != nil { | ||
| t.Fatalf("Set(overwrite %d): %v", i, err) | ||
| } | ||
| for i := range uniqueKeys { | ||
| j := rng.Intn(uniqueKeys) | ||
| keys[i], keys[j] = keys[j], keys[i] | ||
| } |
There was a problem hiding this comment.
This test won’t compile: for i := range uniqueKeys attempts to range over an int constant. Use an index loop (for i := 0; i < uniqueKeys; i++ { ... }) or range over a slice. Also, b := db.NewBatch() isn’t closed in this test; add defer b.Close() (or close after Write()) to match other batch-using tests.
| if explicitSources && liveByID == nil { | ||
| candidates = append(candidates, rewriteSourceSegment{ | ||
| fileID: id, | ||
| liveBytes: size, | ||
| }) | ||
| continue | ||
| } | ||
| if liveByID == nil { | ||
| candidates = append(candidates, rewriteSourceSegment{ | ||
| fileID: id, | ||
| liveBytes: size, | ||
| }) | ||
| continue |
There was a problem hiding this comment.
The explicitSources && liveByID == nil branch duplicates the immediately following if liveByID == nil branch (both append the same segment and continue). This is dead/redundant logic; remove the explicitSources-specific branch to simplify the selection loop.
Summary
restore,catchup,steady)run_celestiavalidationWhy
Corrected real-node validation against
gomap-geminishows this minimal fix set is sufficient for bothfastandwal_on_fast, while larger local runtime experiments are unnecessary and worse.Validation
wal_on_fastcompleted cleanly on the clean phase-hook base in ~297sfastcompleted cleanly on the clean phase-hook base in ~300s