treedb: improve live vlog maintenance observability#856
treedb: improve live vlog maintenance observability#856
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves TreeDB live value-log (vlog) maintenance observability by exposing vlog-generation metrics via expvar and adding additional byte-level counters to make rewrite planning and rewrite+GC progress visible.
Changes:
- Export
treedb.cache.vlog_generation.*metrics through the expvar stat-selection filter. - Add rewrite-plan selected-byte counters (total/live/stale) and a cumulative rewrite “reclaimed bytes” counter.
- Add unit tests covering new expvar exposure and rewrite-plan byte accounting paths.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| TreeDB/caching/db.go | Adds new vlog-generation stats counters (plan-selected bytes + reclaimed bytes) and emits them via Stats(). |
| TreeDB/caching/expvar_stats.go | Extends expvar filtering to include treedb.cache.vlog_generation.*. |
| TreeDB/caching/expvar_stats_test.go | Verifies expvar selection/coercion includes a representative vlog_generation metric. |
| TreeDB/caching/vlog_generation_scheduler_test.go | Adds unit tests for rewrite-plan outcome observation and byte counter tracking (including fallback path). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| strings.HasPrefix(k, "treedb.cache.vlog_dict.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_generation.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_payload_kind.") || |
There was a problem hiding this comment.
The selection comment says expvar exports only process-wide metric families plus a few cache/backend families (mmap/decode/batch-arena), but this function now also exports the full treedb.cache.vlog_generation.* family. Please update the comment to reflect the expanded scope so future readers don’t miss that maintenance stats are intentionally exposed via expvar.
|
Follow-up slice pushed in What changed
run_celestia validation (fast profile)Command: Run home: Latest diagnostics snapshot sampled: Key counters:
Takeaway: in this early state-sync window, rewrite/vacuum did not trigger, and the dominant behavior was high active-pass contention. We now have explicit counters to drive the next tuning step. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9ce5339a4e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| db.scheduleDueVlogGenerationDeferredMaintenance() | ||
| db.schedulePendingVlogGenerationCheckpointKick() | ||
| rewriteRan := db.vlogGenerationRewriteRuns.Load() > rewriteRunsBefore | ||
| gcRan := db.vlogGenerationGCRuns.Load() > gcRunsBefore |
There was a problem hiding this comment.
Count pass outcomes before queuing follow-up maintenance
The new pass-outcome counters are computed after scheduleDueVlogGenerationDeferredMaintenance()/schedulePendingVlogGenerationCheckpointKick() run in the defer path, so a just-scheduled retry goroutine can increment vlogGenerationRewriteRuns/vlogGenerationGCRuns first and make this pass look like it performed rewrite/GC when it did not. This appears under contention (fast retry wakeups on another core) and skews the new observability metrics (maintenance.passes.*) by overcounting rewrite/gc and undercounting noop for the current pass.
Useful? React with 👍 / 👎.
|
Pushed another follow-up in Code change
Validation
A/B run_celestia comparisonBoth runs used: Baseline run home: New run home: Key counter deltas:
This looks like a material reduction in scheduler contention, and maintenance made actual rewrite/vacuum progress in the same early-state-sync lab window. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| db.vlogGenerationMaintenanceActive.Store(false) | ||
| // If a deferred confirmation/age wake became due while this pass held the | ||
| // scheduler active, requeue it immediately on exit instead of relying on | ||
| // the original retry goroutine to still be alive. | ||
| db.scheduleDueVlogGenerationDeferredMaintenance() | ||
| db.schedulePendingVlogGenerationCheckpointKick() | ||
| rewriteRan := db.vlogGenerationRewriteRuns.Load() > rewriteRunsBefore | ||
| gcRan := db.vlogGenerationGCRuns.Load() > gcRunsBefore | ||
| if rewriteRan { | ||
| db.vlogGenerationMaintenancePassWithRewrite.Add(1) | ||
| } | ||
| if gcRan { | ||
| db.vlogGenerationMaintenancePassWithGC.Add(1) | ||
| } | ||
| if !rewriteRan && !gcRan { | ||
| db.vlogGenerationMaintenancePassNoop.Add(1) | ||
| } |
There was a problem hiding this comment.
The maintenance pass outcome counters (passes.noop/with_rewrite/with_gc) can be mis-attributed due to the defer order: maintenanceActive is cleared and new maintenance may be scheduled before computing rewriteRan/gcRan from the global counters. If a deferred/checkpoint-kick pass starts quickly after maintenanceActive.Store(false), rewriteRan/gcRan may reflect the next pass instead of the one being released. Consider computing/incrementing the pass outcome counters before clearing vlogGenerationMaintenanceActive and before scheduling any follow-up maintenance (or track rewrite/gc execution via local booleans set during this pass).
|
Pushed Code change
Validation
Third comparison run (same lab command)Run home: Snapshot: Counters:
Trend across runs:
So the latest change primarily removes maintenance churn/noops while preserving rewrite/vacuum progress. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c83e2d6164
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| !db.vlogGenerationCheckpointKickPending.Load() && | ||
| !db.vlogGenerationDeferredMaintenancePending.Load() && | ||
| !db.vlogGenerationDeferredMaintenanceDue(now) { | ||
| return false |
There was a problem hiding this comment.
Surface deferred-due check errors before preflight skip
This new hot-foreground preflight can return early without entering maybeRunVlogGenerationMaintenance, but the !db.vlogGenerationDeferredMaintenanceDue(now) term internally treats rewrite-stage read errors as a plain false result. In sustained non-quiet workloads with no pending flags, backend errors from the deferred-due probe are now silently ignored, so scheduler error state and notifyError are never triggered until the node eventually becomes quiet. That regresses operational visibility compared with the previous path, which always entered maintenance and reported queue/stage load failures immediately.
Useful? React with 👍 / 👎.
|
Instrumentation-first update pushed to What this adds
Test coverage
Next step: run |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| strings.HasPrefix(k, "treedb.cache.vlog_payload_split.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_auto.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_dict.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_generation.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_payload_kind.") || |
There was a problem hiding this comment.
The inline comment above this prefix list says only certain cache/backend families are exported via expvar, but treedb.cache.vlog_generation.* is now included. Please update the comment to mention vlog_generation (or reword it so it stays accurate as this allowlist evolves).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31fbb0a53e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| stats, err := rewriter.ValueLogRewriteOnline(ctx, rewriteOpts) | ||
| cancel() | ||
| rewriteDur := time.Since(rewriteStart) | ||
| db.observeVlogGenerationRewriteExecDuration(rewriteDur) |
There was a problem hiding this comment.
Count rewrite exec duration only for successful runs
observeVlogGenerationRewriteExecDuration is called before the rewrite error check, so canceled/failed rewrites contribute to treedb.cache.vlog_generation.rewrite.exec.{total,max}_ms, but rewrite.runs (the denominator for rewrite.exec.avg_ms in Stats()) is incremented only on success. Under repeated failures this yields misleading observability (for example non-zero total/max with avg_ms=0.000, or inflated averages), which can skew maintenance tuning and experiment interpretation.
Useful? React with 👍 / 👎.
|
Follow-up pushed: WhyIn a longer
This indicated retry loops were still attempting while Change
Validation
A/B
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cc50d6de3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if db.vlogGenerationMaintenanceActive.Load() { | ||
| if time.Now().After(deadline) { | ||
| return |
There was a problem hiding this comment.
Preserve checkpoint retry intent while pass is active
When runVlogGenerationMaintenanceRetries is used for checkpoint kicks (stopWhenAcquired=false) and another maintenance pass remains active longer than vlogGenerationCheckpointKickRetryWindow (5s), this new branch only sleeps and then returns without ever invoking maybeRunVlogGenerationMaintenanceWithOptions. That means no collision path runs to re-set vlogGenerationCheckpointKickPending, so the active pass’s deferred schedulePendingVlogGenerationCheckpointKick() sees no pending work and the checkpoint-triggered follow-up maintenance can be dropped entirely.
Useful? React with 👍 / 👎.
| gcStats, err := gcer.ValueLogGC(ctx, gcOpts) | ||
| cancel() | ||
| db.observeVlogGenerationGCExecDuration(time.Since(gcStart)) | ||
| if err != nil { |
There was a problem hiding this comment.
Record GC exec duration only for successful runs
observeVlogGenerationGCExecDuration is called before checking err, but vlogGenerationGCRuns (used as the denominator for gc.exec.avg_ms in Stats()) is incremented only on success. Failed or canceled GC calls therefore increase gc.exec.total_ms/max_ms without increasing gc.runs, producing inconsistent and inflated observability during error periods.
Useful? React with 👍 / 👎.
| db.observeVlogGenerationVacuumExecDuration(time.Since(vacuumStart)) | ||
| if err != nil { | ||
| db.vlogGenerationVacuumFailures.Add(1) |
There was a problem hiding this comment.
Record vacuum exec duration only for successful runs
The vacuum execution duration is observed before the error check, while vlogGenerationVacuumRuns (the denominator for vacuum.exec.avg_ms) is incremented only after success. If vacuum fails, vacuum.exec.total_ms/max_ms still grow but vacuum.runs does not, so the new average metrics can report misleading values (including non-zero totals with zero average).
Useful? React with 👍 / 👎.
|
Follow-up update pushed: What was added:
Validation:
Live run (baseline fast profile, no trigger override):
Interpretation:
Offline headroom on same home:
So live maintenance remains significantly below post-rewrite floor, and these counters now make that bottleneck explicit. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
TreeDB/db/vlog_gc.go:270
- ValueLogGC() returns early on opts.DryRun before computing SegmentsPending/BytesPending (and the observed-source pending fields). If these fields are intended to represent "eligible - deleted" backlog, then in DryRun (where deleted=0) they likely should be populated as pending=eligible, or the struct/docs should explicitly state that *Pending is only set for non-dry-run GC runs.
if opts.DryRun {
continue
}
if err := vm.MarkZombie(id); err != nil {
return stats, err
}
candidates[id] = candidate{path: f.Path, size: size, observed: observed}
}
if opts.DryRun {
if set != nil {
_ = vm.Release(set)
}
db.persistValueLogRefTrackerBestEffort()
return stats, nil
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if v, ok := envInt(envVlogMaxRetainedBytes); ok { | ||
| opts.ValueLog.MaxRetainedBytes = int64(v) | ||
| } | ||
| if v, ok := envInt(envVlogMaxRetainedBytesHard); ok { | ||
| opts.ValueLog.MaxRetainedBytesHard = int64(v) | ||
| } | ||
| if v, ok := envInt(envVlogRewriteBudgetBytesPerSec); ok { | ||
| opts.ValueLog.Generational.RewriteBudgetBytesPerSec = int64(v) | ||
| } | ||
| if v, ok := envInt(envVlogRewriteBudgetRecordsPerSec); ok { | ||
| opts.ValueLog.Generational.RewriteBudgetRecordsPerSec = v | ||
| } | ||
| if v, ok := envInt(envVlogRewriteTriggerTotalBytes); ok { | ||
| opts.ValueLog.Generational.RewriteTriggerTotalBytes = int64(v) | ||
| } | ||
| if v, ok := envInt(envVlogRewriteTriggerStaleRatioPPM); ok { | ||
| if v < 0 { | ||
| v = 0 | ||
| } | ||
| opts.ValueLog.Generational.RewriteTriggerStaleRatioPPM = uint32(v) | ||
| } | ||
| if v, ok := envInt(envVlogRewriteTriggerChurnPerSec); ok { | ||
| opts.ValueLog.Generational.RewriteTriggerChurnPerSec = int64(v) | ||
| } |
There was a problem hiding this comment.
applyEnvMaintenanceOverrides() parses several env vars documented as int64 (e.g. TREEDB_VLOG_MAX_RETAINED_BYTES*, TREEDB_VLOG_REWRITE_*_BYTES_PER_SEC) via envInt(), which uses strconv.Atoi into an int. This can overflow / reject valid large int64 values (and is inconsistent with the int64 doc comments). Consider adding an envInt64 helper (strconv.ParseInt) and using it for int64-backed options (and keeping envInt for true int fields like *_RECORDS_PER_SEC).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 431d323f84
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return nil | ||
| } | ||
| } | ||
| now := time.Now() | ||
| db.vlogGenerationLastGCUnixNano.Store(now.UnixNano()) | ||
| ctx, cancel := db.foregroundMaintenanceContext(30 * time.Second) | ||
| gcOpts := backenddb.ValueLogGCOptions{ProtectedPaths: db.valueLogProtectedPaths()} | ||
| ctx, cancel := db.vlogGenerationMaintenanceContext(30*time.Second, opts) |
There was a problem hiding this comment.
Preserve close cancellation in bypass-quiet GC maintenance
This GC call now uses vlogGenerationMaintenanceContext(30*time.Second, opts), and in checkpoint-kick/deferred paths (opts.bypassQuiet == true) that helper returns a timeout-only context that is not canceled by db.closeCh. If Close() happens while ValueLogGC is blocked, the maintenance goroutine can keep running until timeout/backend return, which can delay shutdown by up to 30s; this is a regression from the previous close-aware foregroundMaintenanceContext behavior on this path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if value is None or math.isnan(value): | ||
| return "n/a" | ||
| n = float(value) |
There was a problem hiding this comment.
human_bytes calls math.isnan(value) before coercing to float. Many call sites pass ints (e.g., from counters), and math.isnan(int) raises TypeError, causing the script to crash. Coerce to float first (or guard with isinstance(value, float) / try-except) before calling math.isnan.
| if value is None or math.isnan(value): | |
| return "n/a" | |
| n = float(value) | |
| if value is None: | |
| return "n/a" | |
| try: | |
| n = float(value) | |
| except (TypeError, ValueError): | |
| return "n/a" | |
| if math.isnan(n): | |
| return "n/a" |
| removed = true | ||
| if size > 0 { | ||
| out.RemovedSegments++ | ||
| out.RemovedBytes += size | ||
| } | ||
| if candidate.observed { | ||
| out.ObservedSourceRemovedSegments++ | ||
| if size > 0 { | ||
| out.ObservedSourceRemovedBytes += size | ||
| } | ||
| } | ||
| continue | ||
| } | ||
| if db.cleanupMissingRetainedValueLog(path) { | ||
| if size > 0 { | ||
| out.RemovedSegments++ | ||
| out.RemovedBytes += size | ||
| } |
There was a problem hiding this comment.
Same issue as earlier: when cleaning up orphaned/missing retained logs, RemovedSegments/RemovedBytes are only updated when size > 0, which can undercount removals if the size wasn't tracked. Increment the segment counter regardless of size knowledge.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7dd2a3448
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| gcOpts := db.valueLogGCOptions(false) | ||
| if forceObservedSourceGC { | ||
| gcOpts.ObservedSourceFileIDs = append([]uint32(nil), observedSourceGCIDs...) | ||
| db.vlogGenerationObservedGCRuns.Add(1) |
There was a problem hiding this comment.
Increment observed GC run counter only after successful GC
The new observed-source GC run metric is incremented before ValueLogGC returns, so failed or canceled GC attempts are counted as completed runs. This happens when forceObservedSourceGC is true and gcer.ValueLogGC returns an error; the code re-queues IDs on error, but treedb.cache.vlog_generation.observed_gc.runs has already been increased, which makes run-count telemetry inaccurate during failure periods.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 18 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if f.Path != "" { | ||
| if info, err := os.Stat(f.Path); err == nil && info.Size() > 0 { | ||
| return uint64(info.Size()) | ||
| } | ||
| } |
There was a problem hiding this comment.
ZombieStats is called from Stats/expvar paths and currently may call os.Stat(f.Path) via valueLogFileSizeBestEffort while holding m.mu.RLock(). Doing filesystem I/O under the manager lock can introduce latency and block writers/maintenance. Consider avoiding os.Stat here (e.g., rely on cached fileSize/mmap length only) or snapshot zombie file paths/sizes under the lock and perform any best-effort stat calls after unlocking (optionally caching results back into fileSize).
| if f.Path != "" { | |
| if info, err := os.Stat(f.Path); err == nil && info.Size() > 0 { | |
| return uint64(info.Size()) | |
| } | |
| } |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7ef33865a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if size > 0 { | ||
| out.RemovedSegments++ | ||
| out.RemovedBytes += size |
There was a problem hiding this comment.
Count removed prune segments even when size is unknown
This branch only increments RemovedSegments when size > 0, but valueLogClosedSegmentSize returns 0 when a retained path has no tracked closed-size entry (for example after reopen or metadata loss). In that case the segment is still pruned by cleanupMissingRetainedValueLog, yet treedb.cache.vlog_retained_prune.removed_segments is not incremented, and observed-source removed counts can exceed global removed counts. This introduces inconsistent retained-prune telemetry for operators and automation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a9e6fc3a9a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| passes_total = m["maintenance_noop"] + m["maintenance_with_rewrite"] + m["maintenance_with_gc"] | ||
| m["maintenance_passes_total"] = passes_total |
There was a problem hiding this comment.
Derive maintenance pass totals from mutually exclusive counters
maintenance_with_rewrite and maintenance_with_gc are not mutually exclusive (a single pass can increment both in maybeRunVlogGenerationMaintenanceWithOptions), so summing noop + with_rewrite + with_gc overcounts total passes whenever rewrite and GC run together. This inflates maintenance_passes_total and skews the rewrite/gc share percentages used by this analyzer, which can mislead capacity interpretation during mixed rewrite+gc workloads.
Useful? React with 👍 / 👎.
| m["maintenance_skip"] = skip_map | ||
| m["maintenance_skip_total"] = sum(skip_map.values()) | ||
|
|
There was a problem hiding this comment.
Exclude umbrella stage_gate from skip-total aggregation
maintenance_skip_total is computed as a raw sum of all skip counters, but stage_gate is an umbrella counter that is incremented together with stage_gate_not_due or stage_gate_due_reserved for the same skip event. This double-counts stage-gate skips and overstates total skip pressure, which can lead operators to overestimate scheduler contention in diagnostics.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 18 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| m.mu.RLock() | ||
| for _, f := range m.files { | ||
| if f == nil || !f.IsZombie.Load() { | ||
| continue | ||
| } | ||
| segments++ | ||
| size := valueLogFileSizeBestEffort(f) | ||
| bytes += size | ||
| if f.RefCount.Load() > 0 { | ||
| pinnedSegments++ | ||
| pinnedBytes += size | ||
| continue | ||
| } | ||
| unpinnedSegments++ | ||
| unpinnedBytes += size | ||
| } | ||
| m.mu.RUnlock() |
There was a problem hiding this comment.
ZombieStats() holds m.mu.RLock() while calling valueLogFileSizeBestEffort(), which may perform os.Stat(f.Path). Doing filesystem I/O under the manager lock can block writers/maintenance and increase tail latency. Consider snapshotting the zombie *File pointers (or their Path + cached size fields) under the lock, then compute sizes (including any os.Stat) after releasing the lock.
| strings.HasPrefix(k, "treedb.cache.vlog_payload_split.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_auto.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_dict.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_generation.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_retained_prune.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_payload_kind.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_outer_leaf_codec.") || | ||
| strings.HasPrefix(k, "treedb.cache.batch_arena.") { |
There was a problem hiding this comment.
The expvar selector now exports treedb.cache.vlog_generation.* and treedb.cache.vlog_retained_prune.*, but the capacity analyzer/runbook also relies on treedb.cache.vlog_zombie.* keys. Since that prefix is not allowlisted here, *.debug_vars.json snapshots won't include zombie inventory metrics and the report will silently show zeros. Add strings.HasPrefix(k, "treedb.cache.vlog_zombie.") to the allowlist (and consider a small selector test for one zombie key).
| // Value-log dictionary compression knobs (cached mode). | ||
| // | ||
| // Enabling dict compression requires: | ||
| // - ValueLog compression mode that allows dicts (auto/dict), and | ||
| // - Dict training enabled (TrainBytes > 0), and | ||
| // - Side stores enabled (dictdb), and | ||
| // - Split value log enabled (value pointers used). | ||
| envVlogDictEnable = "TREEDB_VLOG_DICT_ENABLE" // bool | ||
| envVlogDictTrainBytes = "TREEDB_VLOG_DICT_TRAIN_BYTES" // int | ||
| envVlogDictBytes = "TREEDB_VLOG_DICT_BYTES" // int | ||
| envVlogDictMinRecords = "TREEDB_VLOG_DICT_MIN_RECORDS" // int | ||
| envVlogDictMaxRecordBytes = "TREEDB_VLOG_DICT_MAX_RECORD_BYTES" // int | ||
| envVlogDictSampleStride = "TREEDB_VLOG_DICT_SAMPLE_STRIDE" // int | ||
| envVlogDictDedupWindow = "TREEDB_VLOG_DICT_DEDUP_WINDOW" // int | ||
| envVlogDictTrainLevel = "TREEDB_VLOG_DICT_TRAIN_LEVEL" // int | ||
| envVlogDictMaxK = "TREEDB_VLOG_DICT_MAX_K" // int | ||
| envVlogDictClassMode = "TREEDB_VLOG_DICT_CLASS_MODE" // single|split_outer_leaf | ||
| envVlogDictZstdLevel = "TREEDB_VLOG_DICT_ZSTD_LEVEL" // fastest|default|better|best|int | ||
| envVlogDictEntropy = "TREEDB_VLOG_DICT_ENTROPY" // bool | ||
| envVlogDictAdaptiveRatio = "TREEDB_VLOG_DICT_ADAPTIVE_RATIO" // float64 | ||
| envVlogDictMinPayloadSavings = "TREEDB_VLOG_DICT_MIN_PAYLOAD_SAVINGS_RATIO" // float64 | ||
| envVlogDictEnable = "TREEDB_VLOG_DICT_ENABLE" // bool | ||
| envVlogDictTrainBytes = "TREEDB_VLOG_DICT_TRAIN_BYTES" // int | ||
| envVlogDictBytes = "TREEDB_VLOG_DICT_BYTES" // int | ||
| envVlogDictMinRecords = "TREEDB_VLOG_DICT_MIN_RECORDS" // int | ||
| envVlogDictMaxRecordBytes = "TREEDB_VLOG_DICT_MAX_RECORD_BYTES" // int | ||
| envVlogDictSampleStride = "TREEDB_VLOG_DICT_SAMPLE_STRIDE" // int | ||
| envVlogDictDedupWindow = "TREEDB_VLOG_DICT_DEDUP_WINDOW" // int | ||
| envVlogDictTrainLevel = "TREEDB_VLOG_DICT_TRAIN_LEVEL" // int | ||
| envVlogDictMaxK = "TREEDB_VLOG_DICT_MAX_K" // int | ||
| envVlogDictClassMode = "TREEDB_VLOG_DICT_CLASS_MODE" // single|split_outer_leaf | ||
| envVlogDictZstdLevel = "TREEDB_VLOG_DICT_ZSTD_LEVEL" // fastest|default|better|best|int | ||
| envVlogDictEntropy = "TREEDB_VLOG_DICT_ENTROPY" // bool | ||
| envVlogDictAdaptiveRatio = "TREEDB_VLOG_DICT_ADAPTIVE_RATIO" // float64 | ||
| envVlogDictMinPayloadSavings = "TREEDB_VLOG_DICT_MIN_PAYLOAD_SAVINGS_RATIO" // float64 | ||
| envVlogMaxRetainedBytes = "TREEDB_VLOG_MAX_RETAINED_BYTES" // int64 | ||
| envVlogMaxRetainedBytesHard = "TREEDB_VLOG_MAX_RETAINED_BYTES_HARD" // int64 | ||
| envVlogRewriteBudgetBytesPerSec = "TREEDB_VLOG_REWRITE_BUDGET_BYTES_PER_SEC" // int64 | ||
| envVlogRewriteBudgetRecordsPerSec = "TREEDB_VLOG_REWRITE_BUDGET_RECORDS_PER_SEC" // int | ||
| envVlogRewriteTriggerTotalBytes = "TREEDB_VLOG_REWRITE_TRIGGER_TOTAL_BYTES" // int64 | ||
| envVlogRewriteTriggerStaleRatioPPM = "TREEDB_VLOG_REWRITE_TRIGGER_STALE_RATIO_PPM" // uint32 | ||
| envVlogRewriteTriggerChurnPerSec = "TREEDB_VLOG_REWRITE_TRIGGER_CHURN_PER_SEC" // int64 |
There was a problem hiding this comment.
The new TREEDB_VLOG_* retained/rewrite env vars are grouped under the “Value-log dictionary compression knobs” header, but they’re unrelated to dict compression. This makes the comment above the block misleading and makes the env list harder to scan. Consider moving these constants to a separate section (e.g., “Value-log maintenance knobs”) or updating the header comment to reflect the expanded scope.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e9a018a23
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| strings.HasPrefix(k, "treedb.cache.vlog_generation.") || | ||
| strings.HasPrefix(k, "treedb.cache.vlog_retained_prune.") || |
There was a problem hiding this comment.
Include vlog_zombie keys in expvar whitelist
The commit adds treedb.cache.vlog_zombie.* counters in DB.Stats() (used by the new maintenance analyzer), but selectTreeDBExpvarStats still whitelists only vlog_generation and vlog_retained_prune families here. In environments that consume diagnostics via this expvar filter (the default run-celestia path), zombie metrics are dropped and the analyzer’s zombie inventory fields read as zero, masking pinned-zombie buildup and skewing reclaim diagnostics.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 18 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| m.mu.RLock() | ||
| for _, f := range m.files { | ||
| if f == nil || !f.IsZombie.Load() { | ||
| continue | ||
| } | ||
| segments++ | ||
| size := valueLogFileSizeBestEffort(f) | ||
| bytes += size | ||
| if f.RefCount.Load() > 0 { | ||
| pinnedSegments++ | ||
| pinnedBytes += size | ||
| continue | ||
| } | ||
| unpinnedSegments++ | ||
| unpinnedBytes += size | ||
| } | ||
| m.mu.RUnlock() | ||
| return segments, bytes, pinnedSegments, pinnedBytes, unpinnedSegments, unpinnedBytes |
There was a problem hiding this comment.
ZombieStats holds Manager.mu.RLock while calling valueLogFileSizeBestEffort, which can fall back to os.Stat on f.Path. Doing filesystem I/O under the manager lock can introduce latency/lock contention if fileSize/mmapData are unset (e.g., newly discovered or partially initialized files). Consider collecting the zombie file pointers/paths under the lock, then computing best-effort sizes outside the lock (or restricting to cached size only) to keep stats reporting non-blocking.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db46ff7081
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| protectedAll := mergeUniqueNonEmptyPaths(opts.ProtectedPaths, opts.ProtectedInUsePaths, opts.ProtectedRetainedPaths) | ||
| if len(protectedAll) > 0 { | ||
| if recent := recentValueLogIDsForProtectedPaths(set, valueLogKeepRecentSegmentsPerLane, protectedAll); len(recent) > 0 { | ||
| keptIDs = recent |
There was a problem hiding this comment.
Preserve active IDs when narrowing GC lanes
keptIDs starts as currentValueLogIDs(set) but is then replaced with recent when any protected-path set is provided. With the new ProtectedInUsePaths/ProtectedRetainedPaths API, callers can validly pass only one class of paths; in that case recentValueLogIDsForProtectedPaths keeps only those lanes, so current writable segments in other lanes can be treated as eligible and marked zombie/deleted if currently unreferenced. This can prune active segments outside the protected lane subset; merge recent into keptIDs instead of overwriting it.
Useful? React with 👍 / 👎.
Summary
treedb.cache.vlog_generation.*metrics through expvar selectionTesting
go test ./TreeDB/caching -count=1go test ./TreeDB/caching -run 'TestSelectTreeDBExpvarStatsFiltersAndCoerces|TestObserveVlogGenerationRewritePlanOutcome_SelectedTracksBytes|TestObserveVlogGenerationRewritePlanOutcome_SelectedTracksSegmentFallbackBytes' -count=1