caching: pressure-aware pool retention governor#826
caching: pressure-aware pool retention governor#826snissn wants to merge 4 commits intopr/zipper-nodeview-scratch-standalonefrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d20dd613ba
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds a heap-pressure–aware governor to TreeDB/caching pooling so the system retains fewer pooled buffers/slices under sustained heap pressure, while keeping baseline budgets unchanged in normal conditions.
Changes:
- Introduces sampled heap-pressure snapshots (with
GOMEMLIMITawareness) and scales effective retention budgets underhigh/criticalpressure. - Applies effective budgeting to batch copy arena retention and entry-slice retention, and adds periodic trimming of retained entry-slice leases under pressure.
- Exposes new pool-pressure and (base vs effective) pool budget diagnostics via
DB.Stats(), and adds targeted tests.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| TreeDB/caching/db.go | Adds pool-pressure sampling/governor, hooks effective budgets into pooling paths, trims entry-slice leases under pressure, and publishes new Stats() diagnostics. |
| TreeDB/caching/pool_pressure_test.go | Adds tests covering budget scaling, lease trimming behavior, and batch arena effective budget scaling. |
Comments suppressed due to low confidence (1)
TreeDB/caching/db.go:694
putBatchArenastill puts the buffer intobatchArenaPoolseven when the effective retention budget is 0 (e.g., under critical pressure). That means retention is not actually disabled andbatchArenaPoolBytesstops tracking what’s being retained, allowing unbounded pool growth under pressure. Gate thePutonbudget > 0(or early-return when budget <= 0) so critical pressure truly disables additional retention and accounting remains consistent.
if budget := currentBatchArenaRetentionBudgetBytes(); budget > 0 {
size := int64(cap(buf))
noteEpoch := false
for {
held := batchArenaPoolBytes.Load()
if held+size > budget {
before := held
maybeResetBatchArenaPoolBytesAfterGC()
held = batchArenaPoolBytes.Load()
if held == before || held+size > budget {
return
}
continue
}
if batchArenaPoolBytes.CompareAndSwap(held, held+size) {
noteEpoch = held == 0
break
}
}
if noteEpoch {
noteBatchArenaPoolGC(batchArenaPoolNumGC())
}
}
batchArenaPools[idx].Put(buf[:0])
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Introduces heap-pressure–aware retention budgeting for key caching pools in TreeDB/caching, aiming to reduce RSS/heap retention during restore-like spikes while keeping normal-path behavior stable.
Changes:
- Add a sampled heap pressure snapshot (
normal/high/critical) (withGOMEMLIMITawareness) and scale effective pool retention budgets under pressure. - Apply pressure-scaled effective budgets to batch copy arena retention and entry-slice retention, plus periodic trimming of retained entry-slice leases under pressure.
- Expose new pressure and pool-budget diagnostics via
DB.Stats()and wire selected keys intocmd/unified_benchoutput; adjust a valuelog test to avoid remap goroutine races.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
TreeDB/caching/db.go |
Implements heap-pressure sampling, budget scaling, lease trimming, and publishes new Stats() diagnostics/counters. |
TreeDB/caching/pool_pressure_test.go |
Adds tests for budget scaling and trimming behavior across pressure levels. |
cmd/unified_bench/main.go |
Prints new caching/memory pressure stat keys in the concise TreeDB cache stats output. |
TreeDB/internal/valuelog/manager_test.go |
Updates test setup to force grouped fallback path without spawning async remap goroutines under -race. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Adds a heap-pressure–aware “governor” to TreeDB/caching pooling so the system retains less pooled memory during sustained heap pressure, while keeping baseline (normal-pressure) behavior stable.
Changes:
- Introduces a sampled heap-pressure snapshot (
normal/high/critical) and scales effective retention budgets accordingly. - Applies scaled budgets to batch copy-arena retention and entry-slice retention, including periodic trimming of retained entry-slice leases under pressure.
- Extends diagnostics/telemetry: new
DB.Stats()keys and bench output support; adjusts a valuelog test to avoid an async remap race in-race.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
TreeDB/caching/db.go |
Implements pressure sampling + budget scaling, lease trimming, and publishes new stats/counters. |
TreeDB/caching/pool_pressure_test.go |
Adds tests for budget scaling, trimming behavior, and batch-arena retention behavior under pressure. |
cmd/unified_bench/main.go |
Adds the new stats keys to the “small and stable” TreeDB cache stats print set. |
TreeDB/internal/valuelog/manager_test.go |
Adjusts mmap test setup to force fallback path without spawning an async remap goroutine. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…sure-pool-governor # Conflicts: # TreeDB/internal/valuelog/manager_test.go
Summary
Introduce a low-risk heap-pressure governor for pool retention in
TreeDB/caching.normal/high/critical) usingruntime.MemStats(+GOMEMLIMITawareness).high: halve retention budgetcritical: disable additional retentionputBatchArena)reserveEntrySlicePoolBytes)DB.Stats():Why
Recent celestia sync runs showed elevated RSS peaks. Existing pooling reduced alloc churn but could still retain large buffers during restore spikes. This change keeps normal-path behavior intact while reducing retention under sustained heap pressure.
Tests
TreeDB/caching/pool_pressure_test.goGOWORK=off go test ./TreeDB/caching -count=1GOWORK=off go test ./TreeDB/... -count=1GOWORK=off go vet ./...Run Evidence (serial, local)
Control baseline used detached worktree at
8d33a402.fast
pr/rss-pressure-pool-governor):/home/mikers/.celestia-app-mainnet-treedb-20260313133004duration_seconds=296max_rss_kb=278243848d33a402):/home/mikers/.celestia-app-mainnet-treedb-20260313134312duration_seconds=286max_rss_kb=28995168-1,170,784 kB(-4.0%)wal_on_fast
pr/rss-pressure-pool-governor):/home/mikers/.celestia-app-mainnet-treedb-20260313133605duration_seconds=352max_rss_kb=256971928d33a402):/home/mikers/.celestia-app-mainnet-treedb-20260313134830duration_seconds=362max_rss_kb=27271116-1,573,924 kB(-5.8%)Notes