Skip to content

caching: pressure-aware pool retention governor#826

Open
snissn wants to merge 4 commits intopr/zipper-nodeview-scratch-standalonefrom
pr/rss-pressure-pool-governor
Open

caching: pressure-aware pool retention governor#826
snissn wants to merge 4 commits intopr/zipper-nodeview-scratch-standalonefrom
pr/rss-pressure-pool-governor

Conversation

@snissn
Copy link
Copy Markdown
Owner

@snissn snissn commented Mar 13, 2026

Summary

Introduce a low-risk heap-pressure governor for pool retention in TreeDB/caching.

  • Keep existing base pool budgets unchanged (for behavior stability and existing tests).
  • Add a sampled heap-pressure snapshot (normal/high/critical) using runtime.MemStats (+ GOMEMLIMIT awareness).
  • Scale effective retention budgets under pressure:
    • high: halve retention budget
    • critical: disable additional retention
  • Apply effective budgeting to:
    • batch copy arena retention (putBatchArena)
    • entry-slice retention (reserveEntrySlicePoolBytes)
  • Add periodic trimming of retained entry-slice leases when pressure is high/critical.
  • Publish new diagnostics in DB.Stats():
    • pressure level
    • heap alloc/inuse/sys/idle-unreleased
    • base vs effective pool budgets

Why

Recent celestia sync runs showed elevated RSS peaks. Existing pooling reduced alloc churn but could still retain large buffers during restore spikes. This change keeps normal-path behavior intact while reducing retention under sustained heap pressure.

Tests

  • Added: TreeDB/caching/pool_pressure_test.go
    • budget scaling across pressure transitions
    • lease trimming in high/critical modes
    • batch arena effective budget scaling
  • Ran:
    • GOWORK=off go test ./TreeDB/caching -count=1
    • GOWORK=off go test ./TreeDB/... -count=1
    • GOWORK=off go vet ./...

Run Evidence (serial, local)

Control baseline used detached worktree at 8d33a402.

fast

  • Candidate (pr/rss-pressure-pool-governor):
    • home: /home/mikers/.celestia-app-mainnet-treedb-20260313133004
    • duration_seconds=296
    • max_rss_kb=27824384
  • Baseline (8d33a402):
    • home: /home/mikers/.celestia-app-mainnet-treedb-20260313134312
    • duration_seconds=286
    • max_rss_kb=28995168
  • RSS delta: -1,170,784 kB (-4.0%)

wal_on_fast

  • Candidate (pr/rss-pressure-pool-governor):
    • home: /home/mikers/.celestia-app-mainnet-treedb-20260313133605
    • duration_seconds=352
    • max_rss_kb=25697192
  • Baseline (8d33a402):
    • home: /home/mikers/.celestia-app-mainnet-treedb-20260313134830
    • duration_seconds=362
    • max_rss_kb=27271116
  • RSS delta: -1,573,924 kB (-5.8%)

Notes

  • This is intentionally conservative: no behavioral changes to write/maintenance logic, only retention policy when heap pressure rises.
  • If merged, follow-up can tune pressure thresholds and add RSS-aware (not just heap-aware) gating.

Copilot AI review requested due to automatic review settings March 13, 2026 23:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d20dd613ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a heap-pressure–aware governor to TreeDB/caching pooling so the system retains fewer pooled buffers/slices under sustained heap pressure, while keeping baseline budgets unchanged in normal conditions.

Changes:

  • Introduces sampled heap-pressure snapshots (with GOMEMLIMIT awareness) and scales effective retention budgets under high/critical pressure.
  • Applies effective budgeting to batch copy arena retention and entry-slice retention, and adds periodic trimming of retained entry-slice leases under pressure.
  • Exposes new pool-pressure and (base vs effective) pool budget diagnostics via DB.Stats(), and adds targeted tests.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
TreeDB/caching/db.go Adds pool-pressure sampling/governor, hooks effective budgets into pooling paths, trims entry-slice leases under pressure, and publishes new Stats() diagnostics.
TreeDB/caching/pool_pressure_test.go Adds tests covering budget scaling, lease trimming behavior, and batch arena effective budget scaling.
Comments suppressed due to low confidence (1)

TreeDB/caching/db.go:694

  • putBatchArena still puts the buffer into batchArenaPools even when the effective retention budget is 0 (e.g., under critical pressure). That means retention is not actually disabled and batchArenaPoolBytes stops tracking what’s being retained, allowing unbounded pool growth under pressure. Gate the Put on budget > 0 (or early-return when budget <= 0) so critical pressure truly disables additional retention and accounting remains consistent.
	if budget := currentBatchArenaRetentionBudgetBytes(); budget > 0 {
		size := int64(cap(buf))
		noteEpoch := false
		for {
			held := batchArenaPoolBytes.Load()
			if held+size > budget {
				before := held
				maybeResetBatchArenaPoolBytesAfterGC()
				held = batchArenaPoolBytes.Load()
				if held == before || held+size > budget {
					return
				}
				continue
			}
			if batchArenaPoolBytes.CompareAndSwap(held, held+size) {
				noteEpoch = held == 0
				break
			}
		}
		if noteEpoch {
			noteBatchArenaPoolGC(batchArenaPoolNumGC())
		}
	}
	batchArenaPools[idx].Put(buf[:0])
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 14, 2026 04:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces heap-pressure–aware retention budgeting for key caching pools in TreeDB/caching, aiming to reduce RSS/heap retention during restore-like spikes while keeping normal-path behavior stable.

Changes:

  • Add a sampled heap pressure snapshot (normal/high/critical) (with GOMEMLIMIT awareness) and scale effective pool retention budgets under pressure.
  • Apply pressure-scaled effective budgets to batch copy arena retention and entry-slice retention, plus periodic trimming of retained entry-slice leases under pressure.
  • Expose new pressure and pool-budget diagnostics via DB.Stats() and wire selected keys into cmd/unified_bench output; adjust a valuelog test to avoid remap goroutine races.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
TreeDB/caching/db.go Implements heap-pressure sampling, budget scaling, lease trimming, and publishes new Stats() diagnostics/counters.
TreeDB/caching/pool_pressure_test.go Adds tests for budget scaling and trimming behavior across pressure levels.
cmd/unified_bench/main.go Prints new caching/memory pressure stat keys in the concise TreeDB cache stats output.
TreeDB/internal/valuelog/manager_test.go Updates test setup to force grouped fallback path without spawning async remap goroutines under -race.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a heap-pressure–aware “governor” to TreeDB/caching pooling so the system retains less pooled memory during sustained heap pressure, while keeping baseline (normal-pressure) behavior stable.

Changes:

  • Introduces a sampled heap-pressure snapshot (normal/high/critical) and scales effective retention budgets accordingly.
  • Applies scaled budgets to batch copy-arena retention and entry-slice retention, including periodic trimming of retained entry-slice leases under pressure.
  • Extends diagnostics/telemetry: new DB.Stats() keys and bench output support; adjusts a valuelog test to avoid an async remap race in -race.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
TreeDB/caching/db.go Implements pressure sampling + budget scaling, lease trimming, and publishes new stats/counters.
TreeDB/caching/pool_pressure_test.go Adds tests for budget scaling, trimming behavior, and batch-arena retention behavior under pressure.
cmd/unified_bench/main.go Adds the new stats keys to the “small and stable” TreeDB cache stats print set.
TreeDB/internal/valuelog/manager_test.go Adjusts mmap test setup to force fallback path without spawning an async remap goroutine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

…sure-pool-governor

# Conflicts:
#	TreeDB/internal/valuelog/manager_test.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants