decomp: isolate layer-a plus layer-c runtime rewrite maintenance by snissn · Pull Request #843 · snissn/gomap

snissn · 2026-03-20T18:35:39Z

Summary

This PR promotes the decomposed A + C candidate path on top of #842 and explicitly leaves planner-refresh B out of the merge path.

A: retained processed-source cleanup after online rewrite + backend explicit SourceFileIDs live-byte estimation
C: checkpoint-coupled queued rewrite restage after one resumed chunk
B: excluded for now; it improved debt slightly but regressed wall time materially in exact-height fast

Why this exists

The mixed local maintenance head had one real problem: it bundled multiple ideas, so debt wins and wall regressions were not attributable.

This branch is the cleaned candidate path after decomposition:

A is a keep now
C is a keep now
B is preserved for redesign, not merge

What changed

Layer A

retained processed rewrite sources are cleaned up after online rewrite + GC
backend explicit SourceFileIDs rewrite planning estimates live/stale bytes correctly

Layer C

checkpoint-kick rewrite resume processes one queued segment
remaining queued debt is restaged instead of draining through the same hot window

Deterministic evidence

Real-backend test proof

TestVlogGenerationRewriteQueue_CheckpointKickRestagesRemainingRealBackendDebt
now proves on real backend source IDs that:

one checkpoint-coupled queued rewrite runs
remaining queue is restaged
the consumed source was retained at rewrite start
after rewrite it is no longer retained and is dead in the backend

Checkpoint-kick latency harness

Added benchmark:

BenchmarkVlogGenerationRewriteQueueDrainCheckpointKick

One-shot comparison:

A + C

initial_queue=4
max_kick_ms=68.14
rewrite_calls=1
remaining_queue=3
stage_pending=1
total_window_ms=68.22

A only

initial_queue=4
max_kick_ms=84.29
rewrite_calls=4
remaining_queue=0
stage_pending=0
total_window_ms=302.1

Interpretation:

C is not speculative; it bounds a single checkpoint-kick window to one rewrite and restages the remaining tail instead of draining four rewrites through the same hot window.

Local validation

GOWORK=off go test ./TreeDB/db -count=1 -timeout=10m
GOWORK=off go test ./TreeDB/caching -count=1
go vet ./...

`run_celestia` evidence

Layer A exact-height `fast`

home: /home/mikers/.celestia-app-mainnet-treedb-20260320062122
duration_seconds=294
max_rss_kb=11078624
end_app_bytes=5375716247
post vlog-rewrite: 2564040376

A + C exact-height `fast`

home: /home/mikers/.celestia-app-mainnet-treedb-20260320080727
duration_seconds=315
max_rss_kb=11627000
end_app_bytes=5108730308
post vlog-gc: 4796275739
post vlog-rewrite: 2480106610
final du -sb application.db: 2519003499

Important caveat:

the sampled run_celestia runs on A + C did not actually hit online rewrite/restage in node.log
so C is now strongly proven by deterministic backend-backed tests and benchmark evidence, but still needs runtime attribution in a sample that actually exercises queued resume activity

Status

This PR is the promoted candidate path.
Further work should iterate here, not on the old mixed forensic branch.

Copilot

Pull request overview

This PR promotes the decomposed “A + C” runtime value-log rewrite maintenance path: it improves correctness of rewrite planning for explicit SourceFileIDs by estimating live bytes when stale filters are applied (layer A), and changes checkpoint-kick rewrite resume behavior to process a single queued segment then restage remaining debt for later confirmation (layer C). It also adds tests/bench coverage around the new behavior and introduces additional plan telemetry for “age-blocked” segments.

Changes:

Add age-blocked selection stats to ValueLogRewritePlan, and estimate live bytes for explicit SourceFileIDs when stale filters require it.
Persist and schedule “rewrite stage pending” state so checkpoint-kick resume rewrites process one segment and restage remaining debt for later confirmation.
Add/extend tests and a benchmark to validate restaging and retained-source cleanup behavior, including a real-backend checkpoint-kick test.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
TreeDB/db/vlog_rewrite.go	Adds age-blocked plan fields, selection stats plumbing, and conditional live-byte estimation for `SourceFileIDs` with stale filters.
TreeDB/db/vlog_rewrite_test.go	Updates selection tests for new return signature/stats; adds tests covering `SourceFileIDs` + stale filters + live estimation.
TreeDB/caching/db.go	Adds retained-source cleanup after rewrite+GC and integrates restaging behavior into checkpoint-kick maintenance execution.
TreeDB/caching/vlog_generation_state.go	Persists stage-pending/observed timestamps; adds restage helper and ledger stabilization helpers.
TreeDB/caching/vlog_generation_state_test.go	Updates state loader tests for new stage fields.
TreeDB/caching/vlog_retained_manager_drift_test.go	Adds a test ensuring processed retained rewrite sources are cleaned up when backend references are gone.
TreeDB/caching/vlog_generation_scheduler_test.go	Adds a real-backend test for checkpoint-kick restaging and a benchmark harness for drain window behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T18:40:16Z

TreeDB/caching/vlog_generation_scheduler_test.go

+	for i := 0; i < b.N; i++ {
+		db, recorder, initialQueue := seedRealBackendRewriteQueueScenario(b, 25*time.Millisecond)
+		b.StopTimer()
+		var (
+			totalStart    = time.Now()
+			maxKick       time.Duration


In BenchmarkVlogGenerationRewriteQueueDrainCheckpointKick, the expensive setup (seedRealBackendRewriteQueueScenario) currently runs while the benchmark timer is active, so the benchmark and the reported metrics include DB creation/workload seeding time. Also, b.ReportMetric is called inside the per-iteration loop; only the last call per unit is retained in the final BenchmarkResult, so earlier iterations are effectively discarded. Consider stopping the timer before seeding, starting it immediately before the drain loop (set totalStart after StartTimer), and reporting metrics once per benchmark run (or restructure so b.N controls the drain work rather than repeating full setup under the timer).

Copilot · 2026-03-20T18:40:16Z

TreeDB/caching/vlog_generation_state.go

+func mergeConfirmedVlogGenerationRewriteLedgerSegments(prev, planned []backenddb.ValueLogRewritePlanSegment) []backenddb.ValueLogRewritePlanSegment {
+	if len(prev) == 0 || len(planned) == 0 {
+		return nil
+	}
+	stable := stableVlogGenerationRewriteLedgerSegments(prev, planned)
+	if len(stable) == 0 {
+		return nil
+	}
+	if len(stable) == len(planned) {
+		return stable
+	}
+	seen := make(map[uint32]struct{}, len(stable))
+	out := make([]backenddb.ValueLogRewritePlanSegment, 0, len(planned))
+	for _, seg := range stable {
+		if seg.FileID == 0 || seg.BytesLive <= 0 {
+			continue
+		}
+		seen[seg.FileID] = struct{}{}
+		out = append(out, seg)
+	}
+	for _, seg := range planned {
+		if seg.FileID == 0 || seg.BytesLive <= 0 {
+			continue
+		}
+		if _, ok := seen[seg.FileID]; ok {
+			continue
+		}
+		out = append(out, seg)
+	}
+	return out
+}


mergeConfirmedVlogGenerationRewriteLedgerSegments is introduced here but does not appear to be referenced anywhere in the repository. If it’s not part of the intended behavior in this PR, consider removing it to avoid carrying unused code (or add the call site if it is required).

Suggested change

func mergeConfirmedVlogGenerationRewriteLedgerSegments(prev, planned []backenddb.ValueLogRewritePlanSegment) []backenddb.ValueLogRewritePlanSegment {

if len(prev) == 0 || len(planned) == 0 {

return nil

}

stable := stableVlogGenerationRewriteLedgerSegments(prev, planned)

if len(stable) == 0 {

return nil

}

if len(stable) == len(planned) {

return stable

}

seen := make(map[uint32]struct{}, len(stable))

out := make([]backenddb.ValueLogRewritePlanSegment, 0, len(planned))

for _, seg := range stable {

if seg.FileID == 0 || seg.BytesLive <= 0 {

continue

}

seen[seg.FileID] = struct{}{}

out = append(out, seg)

}

for _, seg := range planned {

if seg.FileID == 0 || seg.BytesLive <= 0 {

continue

}

if _, ok := seen[seg.FileID]; ok {

continue

}

out = append(out, seg)

}

return out

}

chatgpt-codex-connector

💡 Codex Review

gomap/TreeDB/db/vlog_rewrite.go

Lines 950 to 954 in 0b16b54

    
           if maxSourceSegments > 0 && len(selected) >= maxSourceSegments { 
        
           	break 
        
           } 
        
           if maxSourceBytes > 0 { 
        
           	next := selectedBytes + candidate.liveBytes

Ignore max-source caps when SourceFileIDs are explicit

When SourceFileIDs is combined with stale filters, rewritePlanNeedsLiveEstimate() now routes the explicit-ID case through selectRewriteSourceSegments, and this block starts enforcing MaxSourceSegments/MaxSourceBytes even though ValueLogRewriteOnlineOptions says those caps apply only when SourceFileIDs is empty (TreeDB/db/vlog_rewrite.go:102-106). Any caller that narrows a preconfigured sparse-plan options struct to a fixed set of file IDs will now silently rewrite only a subset of the requested IDs, leaving the rest of that explicit rewrite set untouched.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T18:44:49Z

TreeDB/caching/vlog_generation_state.go

+	if len(db.vlogGenerationRewriteLedger) == 0 || len(db.vlogGenerationRewriteQueue) == 0 {
+		return len(db.vlogGenerationRewriteQueue), false, nil
+	}


Restage queue-only debt even when no ledger was persisted

This early return means the new restage path only works for queues backed by SelectedSegments. The scheduler still creates queue-only debt via setVlogGenerationRewriteQueue(rewritePlan.SourceFileIDs) in TreeDB/caching/db.go:12943-12945 whenever a planner returns SourceFileIDs without SelectedSegments—which is still allowed by ValueLogRewritePlan (SelectedSegments is documented as present only "when" live estimates were produced in TreeDB/db/vlog_rewrite.go:56-60). In that case a checkpoint-coupled resume never sets RewriteStagePending, so the remaining tail is not deferred and this commit’s hot-window bound disappears for those planners or reopened queue-only states.

Useful? React with 👍 / 👎.

snissn · 2026-03-20T18:51:06Z

Added follow-on commit 8858d340 on top of #843:

caching: narrow queued rewrite refresh to cool paths

What changed:

salvages the useful part of prior B without reintroducing hot-path planner scans
queued-ledger refresh is now best-effort and narrow:
- only multi-segment queued debt
- only non-bypass / cool paths
- not checkpoint-kick resume
- not singleton resume
checkpoint-kick resume keeps the A + C latency-shaping behavior unchanged

New focused validation:

TestVlogGenerationRewriteQueue_RefreshesCurrentEconomicsBeforePeriodicResume
TestVlogGenerationRewriteQueue_RefreshClearsPeriodicQueueWhenNoSegmentsRemain
TestVlogGenerationRewriteQueue_CheckpointKickResumeDoesNotRefreshQueue
existing TestVlogGenerationRewriteQueue_ResumesWithoutReplanning still passes

Local gate:

GOWORK=off go test ./TreeDB/caching -count=1
GOWORK=off go vet ./...

Checkpoint-kick benchmark after this change:

initial_queue=4
rewrite_calls=1
max_kick_ms=62.95
remaining_queue=3
stage_pending=1
total_window_ms=63.05

So the narrow refresh did not reopen the hot checkpoint-kick drain behavior.

Sanity fast run from this head:

home: /home/mikers/.celestia-app-mainnet-treedb-20260320084514
duration_seconds=292
max_rss_kb=10329548
end_app_bytes=4970583650
du -sb application.db=4971729868

Caveat:

this run again did not hit online rewrite/refresh in node.log, so it is only a sanity sample, not a runtime attribution run for the new narrow refresh path.

chatgpt-codex-connector

💡 Codex Review

gomap/TreeDB/caching/db.go

Lines 12773 to 12776 in 8858d34

    
           confirmed := stableVlogGenerationRewriteLedgerSegments(stagedLedger, plan.SelectedSegments) 
        
           if len(confirmed) > 0 { 
        
           	plan = filterVlogGenerationRewritePlanToSegments(plan, confirmed) 
        
           	shouldRewrite = true

Persist the reconfirmed staged ledger before resume

If a staged confirmation pass narrows the debt set (for example, the planner now returns only file 22 out of a staged [11,22]), this only updates the local plan. The execution path later still resumes from the persisted rewriteQueue/vlogGenerationRewriteLedger, so vlogGenerationRewriteLedgerChunk can pick and rewrite a segment that the confirmation planner just dropped. In that case the new two-phase “observe, then confirm” flow does not actually constrain which staged segments are executed.

gomap/TreeDB/caching/db.go

Lines 12778 to 12780 in 8858d34

    
           } else { 
        
           	reason = vlogGenerationReasonStaleRatio 
        
           }

Clear RewriteStagePending when confirmation no longer matches

When confirmed is empty here, the code only changes reason and leaves RewriteStagePending plus the old observedAt intact. After that, generic periodic/checkpoint passes keep returning early because staged work is still pending, while scheduleDueVlogGenerationDeferredMaintenance immediately requeues another stage-confirm wake because that timestamp is already overdue. A segment that stops qualifying during the confirmation delay can therefore trap the scheduler in a replan loop and block later GC/rewrite work.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T18:56:14Z

TreeDB/caching/db.go

+	if hasPlanner && db.shouldRefreshQueuedVlogGenerationRewriteLedger(opts, stagePending, rewriteQueue, rewriteLedger) {
+		queueMinStaleRatio := db.vlogGenerationRewriteMinStaleRatioForQueuedDebt(totalBytes, vlogGenerationReasonRewriteResume)
+		refreshedPlan, refreshErr := db.refreshQueuedVlogGenerationRewritePlan(planner, rewriteQueue, queueMinStaleRatio, opts)


Gate queued-ledger refresh behind the quiet-window check

This refresh planner runs before the later !quiet && !opts.bypassQuiet guard in maybeRunVlogGenerationMaintenanceWithOptions. In WAL-off/relaxed mode, a periodic pass with queued rewrite debt can now call ValueLogRewritePlan during hot ingest, and after this commit that planner may rescan live bytes even for explicit SourceFileIDs. That defeats the quiet-window contract and reintroduces expensive tree scans on the foreground path even when the pass would have skipped rewrite work entirely.

Useful? React with 👍 / 👎.

snissn · 2026-03-20T20:01:10Z

Pushed 52041e41 to pr/vlog-maint-a-plus-c.

This adds a frozen-copy maintenance replay path and folds in two runtime fixes validated against a real copied Celestia application.db:

new offline tooling:
- treemap vlog-maint-once
- debug wrappers/state helpers for cached value-log generation maintenance
- -disable-auto-deferred, -rewrite-plan-timeout, and -wait-idle
stage-confirm semantics hardened:
- staged debt is no longer cleared before a resumed rewrite actually executes
- stage-confirm canceled refresh no longer falls through to an extra sparse stale-ratio replan
executor cadence hardened:
- remaining queued debt is restaged at the end of the maintenance pass, so the confirm timer starts after rewrite+GC tail instead of in the middle of the pass

Local gates on this head:

GOWORK=off go test ./TreeDB/caching ./TreeDB/cmd/treemap -count=1
GOWORK=off go test ./TreeDB -run TestDebugValueLogGenerationStateRoundTrip -count=1
GOWORK=off go vet ./...

Most important frozen-copy result on a real staged Celestia DB copy:

source state: 5 queued staged source IDs
command used:
- TREEDB_DEBUG_VLOG_MAINT=1 TREEDB_DEBUG_VLOG_MAINT_BUDGET=400 go run ./TreeDB/cmd/treemap vlog-maint-once <copied application.db> -rw -json -mode stage-confirm -disable-auto-deferred -rewrite-plan-timeout=2m -wait-idle=3m
result after the pass settled:
- queue shrank 5 -> 4
- RewriteStagePending=true remained set
- RewriteStageObservedAtNS advanced to the new restage point
- gc.deleted_segments=1, gc.deleted_bytes=268435848
- rewrite.runs=1
- checkpoint_kick.pending=false
- scheduler returned to idle

This is the first frozen-copy replay where staged confirm on a real copied Celestia DB consumed exactly one queued segment, GC deleted it, restaged the remaining tail, and returned to idle instead of chain-draining through repeated rewrite_stage_confirm_exit passes.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T20:06:25Z

TreeDB/caching/db.go

+				// Checkpoint-coupled resumable work should consume one chunk, then
+				// re-confirm the remaining tail later instead of draining multiple
+				// queued segments back-to-back in the same hot ingest window.
+				if hadRewriteQueue && !opts.skipCheckpoint {


This restage trigger is currently if hadRewriteQueue && !opts.skipCheckpoint, which will also fire on periodic maintenance passes (default opts.skipCheckpoint=false) whenever there is remaining queued debt. That seems broader than the intent described in the nearby comment ("checkpoint-coupled") and the PR description (restage after checkpoint-kick). Consider tightening the condition (e.g., require opts.bypassQuiet / checkpoint-kick context) so periodic resume behavior doesn’t unexpectedly transition into staged-confirm draining.

Suggested change

if hadRewriteQueue && !opts.skipCheckpoint {

if hadRewriteQueue && !opts.skipCheckpoint && opts.bypassQuiet {

Copilot · 2026-03-20T20:06:25Z

TreeDB/caching/vlog_generation_scheduler_test.go

+		db, recorder, initialQueue := seedRealBackendRewriteQueueScenario(b, 25*time.Millisecond)
+		b.StopTimer()


Benchmark setup is included in the timed portion and the helper uses tb.Cleanup, so each iteration schedules DB/backend closes only at benchmark end. This can keep many DBs open during the run (potential FD/memory exhaustion) and also skews the benchmark timing. Stop the timer before calling seedRealBackendRewriteQueueScenario and explicitly close/cleanup the returned DB/backend per iteration (e.g., return a cleanup func instead of using tb.Cleanup in the helper when running under *testing.B).

Suggested change

db, recorder, initialQueue := seedRealBackendRewriteQueueScenario(b, 25*time.Millisecond)

b.StopTimer()

b.StopTimer()

db, recorder, initialQueue := seedRealBackendRewriteQueueScenario(b, 25*time.Millisecond)

Copilot · 2026-03-20T20:06:25Z

TreeDB/cmd/treemap/vlog_maint_once.go

+		case len(seedPlan.SelectedSegments) > 0:
+			stageObservedAt := int64(0)
+			if *seedStagePending {
+				stageObservedAt = time.Now().Add(-*seedStageObservedAgo).UnixNano()


-seed-stage-observed-ago is described as a duration "in the past", but negative values will produce an observedAt timestamp in the future, delaying stage confirmation in a confusing way. Consider validating that seedStageObservedAgo >= 0 (or clamping to 0) before computing stageObservedAt.

Suggested change

stageObservedAt = time.Now().Add(-*seedStageObservedAgo).UnixNano()

observedAgo := *seedStageObservedAgo

if observedAgo < 0 {

observedAgo = 0

}

stageObservedAt = time.Now().Add(-observedAgo).UnixNano()

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52041e4152

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T20:09:12Z

TreeDB/caching/db.go

+		if !db.valueLogRetained(path) {
+			continue
+		}
+		if db.cleanupOrphanedRetainedValueLog(path) {


Respect backend snapshot pins before deleting processed segments

When a caller keeps a cached AcquireSnapshot()/iterator open across rewrite maintenance, the backend snapshot still expects the old value-log file to remain readable until its pinned ValueLogSet is released. This new path bypasses the backend manager's zombie/pin handling and deletes the file directly via cleanupOrphanedRetainedValueLog, using the cache's separate valueLogReader; as a result, reads from the older snapshot can start failing with missing value-log segments even though the snapshot is still live.

Useful? React with 👍 / 👎.

snissn · 2026-03-20T20:10:16Z

Pushed 32578e98 to pr/vlog-maint-a-plus-c.

This narrows stage-confirm further: when there is already a persisted staged rewrite ledger, the due stage-confirm path now uses that ledger directly instead of issuing a hot-path explicit-SourceFileIDs replan.

Why this matters:

the frozen-copy replay from the prior commit showed the remaining cost driver was the first explicit-ID confirmation plan on a real copied Celestia application.db
that plan was taking about 36s cold before the actual rewrite/GC work even started
queued resume already sorts/prunes the persisted ledger by stale ratio, so we can preserve the economic ranking without paying for a fresh full-tree live-byte scan on the stage-confirm wake

Local gates:

GOWORK=off go test ./TreeDB/caching ./TreeDB/cmd/treemap -count=1
GOWORK=off go test ./TreeDB -run TestDebugValueLogGenerationStateRoundTrip -count=1
GOWORK=off go vet ./...

Frozen-copy delta on the same staged Celestia DB copy:

prior head (52041e41) with -rewrite-plan-timeout=2m -wait-idle=3m
- first maintenance acquire to final idle: about 115s
- included rewrite_queue_refresh ... dur_ms=35963
new head (32578e98) with -wait-idle=2m
- first maintenance acquire to final idle: about 70.6s
- no rewrite_queue_refresh
- consumed one queued segment, GC deleted it, restaged remaining queue 5 -> 4, returned to idle with checkpoint_kick.pending=false

Most important trace line on the new head:

rewrite_stage_confirm_use_persisted queued=5 ledger=5

So this change preserves the good staged-tail behavior from C, but removes the remaining hot-path planner stall that the frozen-copy tool exposed.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32578e987d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T20:21:20Z

TreeDB/caching/db.go

+			dropped := len(rewriteQueue)
+			if err := db.setVlogGenerationRewriteLedger(nil); err != nil {


Preserve queue-only rewrite plans during refresh

This prune path treats any refresh result with SelectedSegments == 0 as “nothing remains”, but ValueLogRewritePlan still allows SourceFileIDs without SelectedSegments (TreeDB/db/vlog_rewrite.go:54-60). If a periodic/reopen refresh gets that queue-only shape from the planner, the code clears the persisted resume queue here even though the planner is still returning actionable source IDs, so the remaining rewrite debt is silently lost until some later full replan recreates it.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T20:21:20Z

TreeDB/caching/db.go

+	var rewriteLedger []backenddb.ValueLogRewritePlanSegment
+	if len(rewriteQueue) > 1 {
+		rewriteLedger, _ = db.currentVlogGenerationRewriteLedger()
+	}


Load the staged ledger for one-segment confirmations

The stale-ratio path can now persist a stage-pending ledger with exactly one queued file, but this code only loads the persisted ledger when len(rewriteQueue) > 1. On the next due rewrite_stage_confirm pass, stageConfirmUsePersistedLedger therefore stays false for one-entry stages, so any transient ValueLogRewritePlan timeout/error leaves hasExecutableRewriteQueue false and the pass does nothing even though the exact segment to rewrite is already on disk. Multi-segment confirms avoid that planner dependency; single-segment confirms do not.

Useful? React with 👍 / 👎.

snissn · 2026-03-20T20:51:12Z

Pushed d6d0a505 (caching: prune missing queued vlog rewrite segments).

What changed:

Added cheap backend segment-presence capability (ValueLogHasSegment) and use it to prune persisted queued rewrite ledger entries that no longer exist before rewrite_resume spends a maintenance pass.
Added scheduler coverage for the due stage-confirm path when one persisted source ID is missing.

Local gate:

GOWORK=off go test ./TreeDB/caching -run 'TestVlogGenerationRewriteQueue_StageConfirmPrunesMissingPersistedSegments|TestVlogGenerationRewriteQueue_StageConfirmRefreshesStagedLedgerEconomics|TestVlogGenerationRewriteQueue_StageConfirmCanceledRefreshDoesNotFallbackToSparsePlan' -count=1
GOWORK=off go test ./TreeDB/db ./TreeDB/caching ./TreeDB/cmd/treemap -count=1
GOWORK=off go vet ./...

Frozen-copy replay result on the completed wal_on_fast DB (/home/mikers/.celestia-app-mainnet-treedb-20260320101754/data/application.db):

persisted state before replay had 2 staged queued source IDs: 2147483668, 2147483669
exact replay command on a reflinked copy: TREEDB_DEBUG_VLOG_MAINT=1 TREEDB_DEBUG_VLOG_MAINT_BUDGET=400 GOWORK=off go run ./TreeDB/cmd/treemap vlog-maint-once <copy>/application.db -rw -json -mode stage-confirm -disable-auto-deferred -wait-idle=2m
new behavior:
- rewrite_queue_missing_prune dropped=2 remaining=0 stage_pending=false
- no rewrite ran (rewrite.runs=0)
- queue/state cleared instead of burning a no-op stage-confirm pass
the follow-on checkpoint pass scanned and found only stale_bytes=4066800 across the whole DB, which is well below the online rewrite threshold.

This explains the earlier WAL frozen replay failure mode: the persisted queued debt on reopen could already be stale/deleted, and we were spending a maintenance opportunity on nonexistent source IDs. This patch makes that case cheap.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T20:55:10Z

TreeDB/caching/db.go

+func (db *DB) cleanupProcessedRetainedRewriteSources(reason uint32, processedRewriteIDs []uint32) {
+	if db == nil || len(processedRewriteIDs) == 0 || !db.valueLogEnabled() {
+		return
+	}
+	liveIDs, err := db.collectValueLogLiveIDsUntil(db.lastForegroundWriteUnixNano.Load())
+	if err != nil {
+		db.debugVlogMaintf(
+			"rewrite_retained_cleanup_scan_err reason=%s source_ids=%d err=%v",
+			vlogGenerationReasonString(reason),
+			len(processedRewriteIDs),
+			err,
+		)
+		return


cleanupProcessedRetainedRewriteSources() calls collectValueLogLiveIDsUntil(), which performs a full pointer-projection scan of the tree (and leaf-ref scan) to rebuild live segment IDs. This runs after every online rewrite pass (and after GC), so it can add an O(size-of-DB) scan to a maintenance step that’s otherwise intended to be bounded. Consider avoiding the full scan here (e.g., rely on rewrite outcomes + backend segment presence/zombie state, or defer retained cleanup to existing periodic retained pruning where the scan is already expected), or at least gate the scan behind a debug/env toggle or a cheap heuristic (only when the retained set actually contains the processed IDs).

Copilot · 2026-03-20T20:55:10Z

TreeDB/caching/vlog_generation_scheduler_test.go

+		db, recorder, initialQueue := seedRealBackendRewriteQueueScenario(b, 25*time.Millisecond)
+		b.StopTimer()


BenchmarkVlogGenerationRewriteQueueDrainCheckpointKick includes seedRealBackendRewriteQueueScenario() in the timed portion (it’s called before b.StopTimer()). This makes the benchmark’s ns/op dominated by setup (DB creation + synthetic workload) rather than the checkpoint-kick drain behavior the metrics are trying to capture. Move the seeding/setup under b.StopTimer()/b.StartTimer() (or use b.ResetTimer() after setup) so the benchmark timing reflects only the drain loop.

Suggested change

db, recorder, initialQueue := seedRealBackendRewriteQueueScenario(b, 25*time.Millisecond)

b.StopTimer()

b.StopTimer()

db, recorder, initialQueue := seedRealBackendRewriteQueueScenario(b, 25*time.Millisecond)

Copilot · 2026-03-20T20:55:11Z

TreeDB/cmd/treemap/vlog_maint_once.go

+	beforeState, err := db.DebugValueLogGenerationState()
+	if err != nil {
+		fatalf("debug state before run: %v", err)
+	}
+	beforeStats := filterVlogGenerationStats(db.Stats())
+
+	report := valueLogMaintOnceReport{
+		Dir:         dir,
+		Mode:        *mode,
+		BeforeState: beforeState,
+		BeforeStats: beforeStats,
+	}


runVlogMaintOnce records report.BeforeState/BeforeStats before applying -clear-state or -seed-from-plan. When either flag is used, the printed/JSON "before_state" no longer reflects the actual state that the maintenance pass ran with, which can make offline forensics misleading. Consider capturing the initial state separately (e.g., InitialState) or recomputing BeforeState/BeforeStats after seeding/clearing and before DebugRunValueLogGenerationMaintenanceOnce().

chatgpt-codex-connector

💡 Codex Review

gomap/TreeDB/caching/db.go

Lines 12339 to 12343 in d6d0a50

    
           func (db *DB) scheduleDueVlogGenerationDeferredMaintenance() { 
        
           	if db == nil || db.closing.Load() { 
        
           		return 
        
           	} 
        
           	now := time.Now()

Respect deferred-disable env when rescheduling due wakes

TREEDB_DISABLE_VLOG_GENERATION_DEFERRED now disables the timer-based stage/age schedulers, but this exit path ignores the same flag and can immediately enqueue rewrite_stage_confirm_exit or rewrite_age_blocked_exit after any acquired pass. That means treemap vlog-maint-once -disable-auto-deferred can still run extra maintenance in the background on copied DBs with overdue staged/age-blocked debt, defeating the new "one pass" offline-analysis mode.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T20:58:08Z

TreeDB/caching/db.go

 				}
 				db.debugVlogMaintf("gc_after_rewrite_done reason=%s dur_ms=%.3f", vlogGenerationReasonString(reason), float64(time.Since(gcStart).Microseconds())/1000)
 			}
+			db.cleanupProcessedRetainedRewriteSources(reason, processedRewriteIDs)


Avoid full live-ID scans after each queued rewrite

When processedRewriteIDs is non-empty, this new call synchronously runs cleanupProcessedRetainedRewriteSources, and that helper immediately calls collectValueLogLiveIDsUntil to rescan the entire tree for live value-log IDs. On large DBs with checkpoint-kick or resumed rewrite debt, every single-segment rewrite now pays an extra full pointer scan before the maintenance pass returns, so the new restage logic no longer keeps the hot-window latency bounded to one rewritten segment.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c869e0a08d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T21:57:30Z

TreeDB/caching/db.go

+	if hasPlanner && (db.shouldRefreshQueuedVlogGenerationRewriteLedger(opts, stagePending, rewriteQueue, rewriteLedger) || stageConfirmRefresh) {
+		queueMinStaleRatio := db.vlogGenerationRewriteMinStaleRatioForQueuedDebt(totalBytes, vlogGenerationReasonRewriteResume)
+		refreshedPlan, refreshErr := db.refreshQueuedVlogGenerationRewritePlan(planner, rewriteQueue, queueMinStaleRatio, opts)


Replan staged stale-ratio confirms with the trigger ratio

When a stale-ratio pass stages exactly one segment, the later rewrite_stage_confirm refresh goes through this path and replans it with vlogGenerationRewriteMinStaleRatioForQueuedDebt(..., vlogGenerationReasonRewriteResume), which is 50% by default. That is stricter than the stale-ratio trigger that created the stage (default 20%), so any staged segment in the 20%-50% band is filtered out here even though it was intentionally staged. Because the stageConfirmRefresh empty-selection branch keeps the staged queue and skips the sparse-plan fallback, that debt can get stuck pending forever instead of ever being rewritten.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T21:57:30Z

TreeDB/internal/valuelog/manager.go

 		for i := 0; i < k+1; i++ {
 			cur := binary.LittleEndian.Uint32(prefix[off : off+4])


Validate grouped RIDs before fast-decoding compressed frames

This new ReadAppend fast path skips over the grouped RID table and only parses offsets, so it no longer rejects frames whose RID entries are structurally invalid. The previous fallback (ReadAtWithDict) validated each RID, which means a compressed grouped record with zeroed/corrupt RID bytes but otherwise readable payload now returns data through ReadAppend/ReadUnsafeAppendBatch instead of ErrCorrupt (including when the bad frame still has a matching CRC, or when checksum verification is disabled). That weakens value-log integrity on iterator and batch-read paths.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T14:35:51Z

TreeDB/internal/valuelog/manager.go

+// AcquireSealedLazyMmapBudget raises this manager's sealed lazy-mmap limits
+// until the returned release func is called. It only ever increases the global
+// defaults for this manager instance; ordinary readers keep using the existing
+// defaults unless a caller explicitly acquires a higher maintenance budget.
+func (m *Manager) AcquireSealedLazyMmapBudget(minSegments int, minBytes int64) func() {
+	if m == nil {
+		return func() {}
+	}
+	if minSegments < 0 {
+		minSegments = 0
+	}
+	if minBytes < 0 {
+		minBytes = 0
+	}
+	m.mu.Lock()
+	m.sealedLazyMmapBudgetDepth++
+	if minSegments > m.sealedLazyMmapBudgetSegments {
+		m.sealedLazyMmapBudgetSegments = minSegments
+	}
+	if minBytes > m.sealedLazyMmapBudgetBytes {
+		m.sealedLazyMmapBudgetBytes = minBytes
+	}
+	m.mu.Unlock()
+	return func() {
+		m.mu.Lock()
+		if m.sealedLazyMmapBudgetDepth > 0 {
+			m.sealedLazyMmapBudgetDepth--
+			if m.sealedLazyMmapBudgetDepth == 0 {
+				m.sealedLazyMmapBudgetSegments = 0
+				m.sealedLazyMmapBudgetBytes = 0
+			}
+		}
+		m.mu.Unlock()
+	}
+}


AcquireSealedLazyMmapBudget uses a single depth counter and keeps the maximum requested segments/bytes until all acquisitions are released. With nested acquisitions, releasing the inner budget will not restore the previous (smaller) budget while the outer acquisition is still active, even though the comment says the override lasts "until the returned release func is called".

If nested acquisition is possible, consider tracking prior values per-acquisition in the returned closure (stack/restore pattern), so each release restores the manager budget to the max of remaining active budgets rather than only resetting when depth reaches 0.

Copilot · 2026-03-23T14:35:51Z

TreeDB/db/vlog_rewrite.go

+	if canceledErr == nil && restrictSource && haveCachedReferencedBefore {
+		referencedAfter = rewriteCleanupDerivedReferencedSegments(cachedReferencedBefore, sourceIDs, newValueIDs)
+	} else {
+		runRewriteCleanupReferencedScanHook()
+		referencedAfter, err = db.referencedValueLogSegments(context.Background())
+		if err != nil {
+			return stats, err
+		}


rewriteCleanupDerivedReferencedSegments derives referencedAfter by cloning the cached referenced set and unconditionally deleting all sourceIDs. This is only safe if the online rewrite guarantees it rewrote all references to those segments. Since ValueLogRewriteOnline only scans the user tree (and only rewrites leaf-ref pages separately), segments in sourceIDs could still be referenced elsewhere (e.g., the system tree or other reachability sources), and would then be incorrectly omitted from referencedAfter, potentially causing them to be marked zombie and deleted.

Consider falling back to referencedValueLogSegments (full scan) for this optimization, or augment the derivation with a targeted verification pass that checks for remaining references to sourceIDs outside the rewritten keyspace before deleting them from the referenced set.

Suggested change

if canceledErr == nil && restrictSource && haveCachedReferencedBefore {

referencedAfter = rewriteCleanupDerivedReferencedSegments(cachedReferencedBefore, sourceIDs, newValueIDs)

} else {

runRewriteCleanupReferencedScanHook()

referencedAfter, err = db.referencedValueLogSegments(context.Background())

if err != nil {

return stats, err

}

runRewriteCleanupReferencedScanHook()

referencedAfter, err = db.referencedValueLogSegments(context.Background())

if err != nil {

return stats, err

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8236397431

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T14:39:45Z

TreeDB/db/vlog_rewrite.go

+		lane, _ := valuelog.DecodeFileID(segment.fileID)
+		current, ok := newestByLane[lane]
+		if !ok || segment.seq > current.seq {
+			newestByLane[lane] = segment


Skip empty newest segments when deriving rewrite RID start

If a lane's highest-seq segment exists but has not received any records yet, this optimization picks that empty file and ignores the previous sealed segment that still holds the lane's highest RID. That state is reachable in normal cached mode because rotateValueLogWriterMuHeld creates and registers/promotes a new segment before the first append (TreeDB/caching/db.go:16282-16301). In a direct DB.ValueLogRewriteOnline call where ReserveRIDs is nil, nextRewriteRIDStart can therefore return a too-low start RID and the rewrite will emit duplicate RIDs; reopen/recovery later fails on the duplicate check in TreeDB/db/wal_recovery.go:193-197.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b05a8b4564

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T15:07:22Z

TreeDB/vlog_rewrite.go

+	if db.cached != nil {
+		if err := db.Checkpoint(); err != nil {


Keep the plan-only API from checkpointing cached DBs

DB.ValueLogRewritePlan is now called from the new vlog-maint-exit-loop reseed/probe path, but this wrapper unconditionally runs Checkpoint() whenever the DB is cached. In cached mode Checkpoint() is not a read-only preparation step: caching.DB.Checkpoint() can flush memtables and then call maybeKickVlogGenerationMaintenanceAfterCheckpoint, so a caller that only wants to inspect candidate segments can mutate the DB and even start rewrite/GC before the explicit maintenance pass runs. That makes the new offline/debug commands nondeterministic and also gives the new public planning API surprising side effects.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 43 out of 43 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T18:54:45Z

TreeDB/cmd/treemap/main.go

+			fatalf("cpuprofile: %v", err)
+		}
+		profFile = f
+		runtimepprof.StartCPUProfile(profFile)


runtime/pprof.StartCPUProfile returns an error, but this call ignores it. Please handle the error (and close the file on failure) so the command doesn’t silently run without profiling or leak the file descriptor when profiling can’t be started.

Suggested change

runtimepprof.StartCPUProfile(profFile)

if err := runtimepprof.StartCPUProfile(profFile); err != nil {

_ = profFile.Close()

fatalf("cpuprofile: %v", err)

}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64711a9b3b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T19:00:19Z

TreeDB/db/vlog_rewrite.go

+			}
+			recs[i] = valuelog.Record{RID: rid, Value: leafPages[start+i]}
+		}
+		ptrs, _, err := w.w.AppendFrameWithStatsInto(0, nil, recs, dst)


Don't batch leaf pages into multi-record grouped frames

page.EncodeLeafRef only preserves FileID and Offset; page/leaf_ref.go reconstructs every leaf ref with grouped sub-index 0 under the explicit K=1 assumption. This new path writes up to 8 leaf pages in one AppendFrameWithStatsInto call, so internal/bulk/builder.go:194-203 can now encode nonzero-subindex pointers as leaf refs during offline rewrite. On any DB using IndexOuterLeavesInValueLog with more than one rewritten leaf page per batch, the 2nd+ child refs decode back to the first leaf in that frame after reopen, corrupting tree navigation and lookups.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f484cef31

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T19:15:49Z

TreeDB/caching/leaf_page_log.go

+			Value: leafPages[i],
+		}
+	}
+	return l.db.appendValueLog(lane, 0, nil, records, journalDurabilityNone, true)


Do not group multiple outer-leaf pages into one frame

When IndexOuterLeavesInValueLog is enabled and a split/merge or bulk build persists 2+ outer leaf pages at once, this funnels them through a single appendValueLog call. That produces grouped pointers, but page.EncodeLeafRef/DecodeLeafRef in TreeDB/page/leaf_ref.go only preserve (FileID, Offset) and explicitly assume leaf pages are stored as K=1, so every pointer after the first collapses to the same leaf-ref ID as sub-index 0. In practice, zipper.mergeLeaf and the bulk builder can now write sibling/root child refs that already alias the wrong leaf in memory and always decode to the wrong page after reopen.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 52 out of 52 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T19:24:59Z

TreeDB/db/vlog_rewrite.go

+		var (
+			leafPage []byte
+			err      error
+		)
+		if toer, ok := c.leafReader.(unsafeToReader); ok {
+			leafPage, _, err = toer.ReadUnsafeTo(ptr, c.leafScratch[:0])
+			if err == nil && len(leafPage) > 0 && len(leafPage) <= cap(c.leafScratch) {
+				c.leafScratch = leafPage[:0]
+			}


In the leaf-ref rewrite path, leafScratch starts nil, so calling ReadUnsafeTo(ptr, c.leafScratch[:0]) won’t provide reusable capacity and the cap(c.leafScratch) check prevents retaining the returned buffer. Consider preallocating c.leafScratch to make([]byte, 0, page.PageSize) (or growing when cap < page.PageSize) before the first ReadUnsafeTo call so the scratch reuse optimization actually takes effect.

chatgpt-codex-connector

💡 Codex Review

gomap/TreeDB/db/vlog_rewrite.go

Lines 1081 to 1087 in d020d3c

    
           if db.indexPackedValuePtr { 
        
           	// Packed on-disk pointers store Offset as u32. Ensure rewritten segments 
        
           	// rotate so newly written pointers remain representable. 
        
           	const packedMax = int64(^uint32(0)) - 4 
        
           	if maxBytes > packedMax { 
        
           		maxBytes = packedMax 
        
           	}

Honor leaf-ref offset cap in online rewrite

When IndexOuterLeavesInValueLog is enabled, leaf refs can only encode offsets up to page.LeafRefMaxOffset, and the offline path later in this file already clamps rewrite segment size for that reason. The online path here still only applies the packed-pointer cap, so a caller that sets MaxSegmentBytes above ~512 MiB can fill a rewrite segment past the LeafRef offset limit and then fail in rewriteLeafRefsOnline when page.EncodeLeafRef sees an unrepresentable offset. This only shows up on DBs with outer leaves in the value log, but in that configuration it turns a valid online rewrite request into a mid-pass failure.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T19:28:03Z

TreeDB/db/vlog_rewrite.go

+	for _, id := range candidateFileIDs {
+		f := files[id]
 		if _, ok := active[id]; ok {
 			continue


Don't discard explicit active source IDs

With SourceFileIDs plus stale filters, rewritePlanNeedsLiveEstimate now routes explicit IDs through this sparse-selection loop. This unconditional active check silently drops any requested segment that is still marked active, so ValueLogRewritePlan/ValueLogRewriteOnline can return an empty selection even when the caller explicitly named that file ID and it satisfies the stale-byte/ratio filter. That regression is specific to the new filtered-SourceFileIDs path; before this change, explicit IDs were honored as requested.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25da6d2fe7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T20:06:52Z

TreeDB/caching/leaf_page_log.go

 	lane := &l.db.lanes[l.laneID]
 	rid := l.db.nextRID.Add(1)
-	ptr, retainPath, err := l.db.appendValueLogOneInternal(lane, 0, nil, rid, leafPage, journalDurabilityNone, false)
+	ptr, retainPath, err := l.db.appendValueLogOneInternal(lane, 0, nil, rid, leafPage, journalDurabilityNone, false, true)


Write single outer-leaf pages as grouped records

When IndexOuterLeavesInValueLog is enabled and compression falls back to the non-block path (for example ValueLogCompression=Off), this AppendLeafPage call still goes through appendValueLogOneInternal, which writes a plain record instead of a grouped K=1 frame. After this change page.DecodeLeafRef always reconstructs grouped pointers, so the empty/single-leaf builder paths in TreeDB/internal/bulk/builder.go:61 and :305 now persist leaf refs that reopen as corrupt because leaf-page reads try to parse a grouped frame that was never written.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-23T20:06:52Z

TreeDB/public.go

+	if opts.IndexOuterLeavesInValueLog {
+		leafRefMaxSegmentBytes := int64(page.LeafRefMaxOffset) - 4
+		if valueLogMaxSegmentBytes == 0 || leafRefMaxSegmentBytes < valueLogMaxSegmentBytes {
+			valueLogMaxSegmentBytes = leafRefMaxSegmentBytes


Clamp rewrite segment bytes to the new LeafRef offset limit

These new LeafRefMaxOffset clamps only protect normal cached writes. Online rewrite still uses opts.MaxSegmentBytes verbatim in TreeDB/db/vlog_rewrite.go:1077-1089, and cached generational maintenance feeds that field from db.valueLogGenerationWarmTarget at TreeDB/caching/db.go:13257-13261. In deployments with IndexOuterLeavesInValueLog and a warm/max rewrite segment target above the new ~512 MiB leaf-ref ceiling, a rewrite can now fail mid-run once a copied leaf page lands past the 29-bit offset limit, because page.EncodeLeafRef rejects the rewritten pointer.

Useful? React with 👍 / 👎.

snissn · 2026-03-23T20:43:35Z

Tracking the next sprint off a single authority issue: #844\n\nThis captures:\n- the current leaf-frame fix status\n- the remaining ordinary-value write-shape problem\n- the roadmap\n- the final validation matrix (, , post-run data-folder analysis, post-offline-rewrite analysis, and unified-bench gates).

snissn · 2026-03-23T20:43:45Z

Tracking the next sprint off a single authority issue: #844

This captures:

the current leaf-frame fix status
the remaining ordinary-value write-shape problem
the roadmap
the final validation matrix (fast, wal_on_fast, post-run data-folder analysis, post-offline-rewrite analysis, and unified-bench gates).

Copilot

Pull request overview

Copilot reviewed 58 out of 58 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T05:34:54Z

TreeDB/zipper/zipper.go

+		if z.outerLeavesInValueLog && z.leafPageBatchLog != nil {
+			data := append([]byte(nil), target.Data()...)
+			pendingLeafPersists = append(pendingLeafPersists, pendingLeafPersist{
+				data:     data,
+				root:     target == builder,
+				splitIdx: pendingSplitIdx,
+			})
+			if target == builder {
+				rootPersisted = true
+			}
+			if len(pendingLeafPersists) >= 8 {
+				return 0, flushPendingLeafPersists()
+			}
+			return 0, nil
+		}


In mergeLeaf's batched outer-leaf persist path, the split leaf builder/page buffers are never released back to their pools. Unlike the non-batched path (which releases pooled builders/pages after persisting when target != builder), the batched branch copies target.Data() into pendingLeafPersists but then returns without releasing target when it's a pooled split leaf. This will accumulate unreleased pooled builders/outerLeafBuildPages across splits and can materially increase memory usage during large merges. After copying the page data, release pooled resources for non-root targets (and clear the split's pooled flags) similarly to the non-batched persist path.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45e774698f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T05:39:53Z

TreeDB/zipper/zipper.go

+			if len(pendingLeafPersists) >= 8 {
+				return 0, flushPendingLeafPersists()
+			}
+			return 0, nil


Release pooled split pages on batched leaf persistence

In the new batched outer-leaf path, persistTarget returns from this branch before reaching the cleanup block that releases target and targetOuterLeafData back to their pools. On split-heavy merges when outerLeavesInValueLog and leafPageBatchLog are enabled, each persisted non-root split leaf leaves behind pooled builder/scratch allocations, so memory use grows with split count and pooling effectiveness is lost.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-24T05:39:53Z

TreeDB/db/vlog_gc_incremental.go

+	if len(in) == 0 {
+		return nil


Preserve empty referenced sets in ref-cache cloning

Returning nil for an empty referenced-ID set collapses a valid "known empty" scan result into the same sentinel used for "no cached value." Downstream cache lookups check for non-nil refs, so successful scans that find zero live segments become cache misses and trigger repeated full reachability rescans in rewrite/GC maintenance, increasing latency on fully-rewritten or near-empty datasets.

Useful? React with 👍 / 👎.

snissn and others added 4 commits March 20, 2026 07:53

decomp: isolate layer-a rewrite cleanup

78b0784

decomp: isolate layer-c rewrite queue restage

d1db4f0

test: strengthen a+c rewrite queue proof

043ef85

bench: add rewrite queue checkpoint-kick harness

0b16b54

Copilot AI review requested due to automatic review settings March 20, 2026 18:35

This was referenced Mar 20, 2026

Follow up fast-maintenance chain review debt after merge #762

Open

caching: staged debt-aware runtime vlog rewrite maintenance #842

Merged

Copilot started reviewing on behalf of snissn March 20, 2026 18:36 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

caching: narrow queued rewrite refresh to cool paths

8858d34

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

caching: debug and harden staged vlog maintenance

52041e4

Copilot AI review requested due to automatic review settings March 20, 2026 20:00

Copilot started reviewing on behalf of snissn March 20, 2026 20:01 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

caching: use staged ledger for due rewrite confirm

32578e9

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

caching: prune missing queued vlog rewrite segments

d6d0a50

Copilot AI review requested due to automatic review settings March 20, 2026 20:50

Copilot started reviewing on behalf of snissn March 20, 2026 20:51 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

perf: cut vlog maintenance replay alloc churn

c869e0a

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

Copilot started reviewing on behalf of snissn March 23, 2026 14:28 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

bulk: batch offline leaf page appends

b05a8b4

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

caching: keep outer leaf pages off lz4 auto path

64711a9

Copilot AI review requested due to automatic review settings March 23, 2026 18:49

Copilot started reviewing on behalf of snissn March 23, 2026 18:49 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

caching: batch outer leaf page writes

7f484ce

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

page: preserve grouped leafref subindex

d020d3c

Copilot AI review requested due to automatic review settings March 23, 2026 19:19

Copilot started reviewing on behalf of snissn March 23, 2026 19:19 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

bench: add outer leaf persist benchmark

25da6d2

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

snissn mentioned this pull request Mar 23, 2026

Sprint: move first-write vlog encoding toward offline-optimal shape #844

Open

codex added 4 commits March 23, 2026 10:56

cmd/treemap: add value-log shape analyzer

53721e4

caching: group mid-large ordinary values on first write

05218de

caching: keep dict writes off grouped ordinary path

0af5fde

perf: pool grouped readappend decode scratch

45e7746

Copilot AI review requested due to automatic review settings March 24, 2026 05:31

Copilot started reviewing on behalf of snissn March 24, 2026 05:31 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

	if maxSourceSegments > 0 && len(selected) >= maxSourceSegments {
	break
	}
	if maxSourceBytes > 0 {
	next := selectedBytes + candidate.liveBytes

	confirmed := stableVlogGenerationRewriteLedgerSegments(stagedLedger, plan.SelectedSegments)
	if len(confirmed) > 0 {
	plan = filterVlogGenerationRewritePlanToSegments(plan, confirmed)
	shouldRewrite = true

	if hadRewriteQueue && !opts.skipCheckpoint {
	if hadRewriteQueue && !opts.skipCheckpoint && opts.bypassQuiet {

		db, recorder, initialQueue := seedRealBackendRewriteQueueScenario(b, 25*time.Millisecond)
		b.StopTimer()

-				stageObservedAt = time.Now().Add(-*seedStageObservedAgo).UnixNano()
+				observedAgo := *seedStageObservedAgo
+				if observedAgo < 0 {
+					observedAgo = 0
+				}
+				stageObservedAt = time.Now().Add(-observedAgo).UnixNano()

		dropped := len(rewriteQueue)
		if err := db.setVlogGenerationRewriteLedger(nil); err != nil {

	func (db *DB) scheduleDueVlogGenerationDeferredMaintenance() {
	if db == nil \|\| db.closing.Load() {
	return
	}
	now := time.Now()

		for i := 0; i < k+1; i++ {
		cur := binary.LittleEndian.Uint32(prefix[off : off+4])

		if db.cached != nil {
		if err := db.Checkpoint(); err != nil {

	if db.indexPackedValuePtr {
	// Packed on-disk pointers store Offset as u32. Ensure rewritten segments
	// rotate so newly written pointers remain representable.
	const packedMax = int64(^uint32(0)) - 4
	if maxBytes > packedMax {
	maxBytes = packedMax
	}

Conversation

snissn commented Mar 20, 2026

Summary

Why this exists

What changed

Layer A

Layer C

Deterministic evidence

Real-backend test proof

Checkpoint-kick latency harness

A + C

A only

Local validation

run_celestia evidence

Layer A exact-height fast

A + C exact-height fast

Status

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

snissn commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

snissn commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

snissn commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

snissn commented Mar 20, 2026

`run_celestia` evidence

Layer A exact-height `fast`

A + C exact-height `fast`