Skip to content

caching: add entry-slice lease/pool/fresh telemetry#838

Open
snissn wants to merge 32 commits intopr/direct-arena-retention-telemetry-oldest-evictionfrom
pr/entry-slice-telemetry-counters
Open

caching: add entry-slice lease/pool/fresh telemetry#838
snissn wants to merge 32 commits intopr/direct-arena-retention-telemetry-oldest-evictionfrom
pr/entry-slice-telemetry-counters

Conversation

@snissn
Copy link
Copy Markdown
Owner

@snissn snissn commented Mar 15, 2026

Summary

  • add entry-slice allocator telemetry counters in TreeDB/caching:
    • entry_slice.get.{lease_hits,pool_hits,fresh_alloc}_{total,bytes_total}
    • entry_slice.put.{lease,pool,drop_budget}_{total,bytes_total}
  • expose these counters via DB.Stats() for both treedb.cache.* and treedb.process.*
  • include the new keys in unified-bench cache-stats output allowlist

Why

We need hard evidence for the next allocator pass on caching.getEntrySlice. Existing stats only showed retained bytes/trim behavior, not whether misses are from lease depth, pool usage, or budget drops.

Evidence (from full unified-bench run)

Command:

GOWORK=off make unified-bench
./bin/unified-bench -dbs treedb -profile fast -keys 500000 -progress=false -treedb-index-outer-leaves-in-vlog=true -valsize=100 -checkpoint-between-tests -treedb-force-value-pointers=false -treedb-cache-stats-after-tests -profile-dir=/tmp/entry-slice-tele-<ts>

Run log: /tmp/unified-run-entry-slice-tele.log
Profile dir: /tmp/entry-slice-tele-1773592024

Post-prefix_scan counters:

  • entry_slice.get.lease_hits_total=26
  • entry_slice.get.pool_hits_total=0
  • entry_slice.get.fresh_alloc_total=598
  • entry_slice.get.fresh_alloc_bytes_total=391905280
  • entry_slice.put.lease_total=624
  • entry_slice.put.pool_total=0
  • entry_slice.put.drop_budget_total=0

Interpretation: churn is dominated by fresh allocations during flush build, not by pool-budget drops.

Validation

  • go test ./TreeDB/caching ./cmd/unified_bench -count=1
  • go vet ./TreeDB/caching ./cmd/unified_bench

Copilot AI review requested due to automatic review settings March 15, 2026 16:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds finer-grained telemetry around the TreeDB/caching entry-slice allocator so unified-bench and DB.Stats() can distinguish reuse via leases/pools from fresh allocations and budget drops.

Changes:

  • Introduces new atomic counters for getEntrySlice and putEntrySlice (lease hits, pool hits, fresh allocs, and budget-drop puts; totals + bytes).
  • Exposes the new counters via DB.Stats() under both treedb.cache.* and treedb.process.* namespaces.
  • Extends unified-bench’s cache-stats allowlist to print the new keys.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
TreeDB/caching/db.go Adds allocator telemetry counters and exports them via DB.Stats().
cmd/unified_bench/main.go Adds the new metric keys to the unified-bench cache-stats output allowlist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd23cf20c1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@snissn
Copy link
Copy Markdown
Owner Author

snissn commented Mar 15, 2026

Pushed 79fc3eb3 to this PR.

Added telemetry only (no behavior change):

  • treedb.cache.flush_merge.shadowed_ops_total
  • treedb.cache.flush_merge.applied_ops_total
  • treedb.cache.flush_merge.shadowed_per_applied
  • same mirrored under treedb.process.*
  • wired cache keys into unified-bench -treedb-cache-stats-after-tests allowlist.

Validation:

  • go test ./TreeDB/caching ./cmd/unified_bench -count=1
  • go vet ./TreeDB/caching ./cmd/unified_bench
  • Full benchmark: /home/mikers/tmp/perf-merge-tele-1773593600

Observed in that run:

  • shadowed_ops_total=0
  • applied_ops_total=4583983
  • shadowed_per_applied=0.000000

This is useful for the next step: it strongly suggests current throughput regressions from the earlier windowed prototype were not due to losing cross-unit dedupe.

Copilot AI review requested due to automatic review settings March 15, 2026 17:02
@snissn
Copy link
Copy Markdown
Owner Author

snissn commented Mar 15, 2026

Pushed follow-up commit b7d745c7.

This splits the flush-merge telemetry by path:

  • flush_merge.deferred.*
  • flush_merge.parallel.*
    plus allowlist wiring for unified-bench cache stat output.

Validation:

  • go test ./TreeDB/caching ./cmd/unified_bench -count=1
  • go vet ./TreeDB/caching ./cmd/unified_bench
  • full bench run: /home/mikers/tmp/perf-merge-pathsplit-1773594061

Observed:

  • flush_merge.deferred.applied_ops_total=0
  • flush_merge.parallel.applied_ops_total=4583983
  • flush_merge.parallel.shadowed_ops_total=0

So the workload pressure is clearly in the parallel merge/build path, and overlap-dedupe isn’t active in this benchmark.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds allocator/merge telemetry to TreeDB caching and surfaces it through DB.Stats() so unified-bench can report it in the cache-stats summary output.

Changes:

  • Add entry-slice get/put telemetry counters (lease hit, pool hit, fresh alloc, and budget drops) and export them via DB.Stats() under both treedb.cache.* and treedb.process.*.
  • Add flush-merge shadow/applied counters (including deferred/parallel breakdown) and derived shadowed-per-applied ratios to DB.Stats().
  • Extend unified-bench’s cache-stats allowlist to include the new stat keys.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
cmd/unified_bench/main.go Allowlist updated to print the new TreeDB entry-slice + flush-merge stat keys in cache-stats output.
TreeDB/caching/db.go Adds new atomic counters, wires them into getEntrySlice/putEntrySlice, tracks flush-merge shadow/applied ops, and exports all via DB.Stats().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7d745c7f5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@snissn
Copy link
Copy Markdown
Owner Author

snissn commented Mar 15, 2026

Added commit 6668e5c (valuelog: refresh stale mmap range before read fallback).\n\nKey validation (serial):\n- unified-bench full (fast): /home/mikers/tmp/perf-remapfix-r3-1773596133\n - wall: 23.615s\n - random_read_parallel: 3,554,475\n- focused random_read_parallel with cache stats:\n - ops/s: 8,139,744\n - treedb.vlog.mmap_read.hits=1,864,999\n - miss_out_of_range=9\n - fallback_readat=0\n\nrun_celestia on this commit:\n- fast: /home/mikers/.celestia-app-mainnet-treedb-20260315073920\n - duration_seconds=330\n - max_rss_kb=12,959,228\n - end_app_bytes=5,523,075,685\n- wal_on_fast: /home/mikers/.celestia-app-mainnet-treedb-20260315074517\n - duration_seconds=382\n - max_rss_kb=13,044,084\n - end_app_bytes=5,293,748,139\n\nAlso ran a local-only follow-up experiment to cap large-buffer headroom in valuelog.grow. Results were mixed/high variance and included a clear bad outlier in wal_on_fast, so that experiment was fully reverted and is not included in this PR.

Copilot AI review requested due to automatic review settings March 15, 2026 18:21
@snissn
Copy link
Copy Markdown
Owner Author

snissn commented Mar 15, 2026

Pushed follow-up commit d614a7e:\n- caching: track batch arena retained high-water bytes\n\nWhat it adds:\n- New global high-water metric for batch arena retained bytes (pool + global leases):\n - treedb.cache.batch_arena.retained_bytes_global_max_estimate\n - treedb.process.batch_arena.retained_bytes_global_max_estimate\n- Metric is updated on retention growth paths (pool puts and lease retains).\n- Added unit test: TestNoteBatchArenaRetainedBytesMax.\n- Added key to unified-bench cache-stats allowlist so it prints in -treedb-cache-stats-after-tests.\n\nValidation:\n- go test ./TreeDB/caching -count=1\n- go vet ./TreeDB/caching\n- go test ./cmd/unified_bench -count=1\n- go vet ./cmd/unified_bench\n- focused bench print confirms new key appears in cache stats output.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new allocator/merge/memory telemetry to TreeDB’s caching layer and wires it through DB.Stats() so unified-bench can report the new counters.

Changes:

  • Add entry-slice get/put telemetry counters (lease hits, pool hits, fresh allocs; plus put lease/pool/budget drops) and expose them via DB.Stats() under both treedb.cache.* and treedb.process.*.
  • Add additional stats/telemetry: batch-arena retained-bytes global max estimate; flush-merge shadowed/applied counters and ratios.
  • Update unified-bench cache-stats allowlist to include the new keys; adjust valuelog mmap read path to synchronously refresh on out-of-range misses.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
cmd/unified_bench/main.go Extends the cache-stats allowlist so unified-bench prints the new TreeDB stat keys.
TreeDB/internal/valuelog/reader_mmap.go Adds synchronous mmap refresh on out-of-range reads to reduce repeated fallbacks to ReadAt.
TreeDB/caching/db.go Introduces new global telemetry counters (entry-slice + flush-merge + batch-arena max) and exports them via DB.Stats().
TreeDB/caching/batch_arena_retained_max_test.go Adds a unit test for the new batch-arena retained-max tracking helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d614a7ee6e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@snissn
Copy link
Copy Markdown
Owner Author

snissn commented Mar 15, 2026

Pushed follow-up commit 877d55b:\n- caching: expose batch-arena stats in expvar treedb map\n\nWhy:\n- run_celestia diagnostics sample /debug/vars; previously treedb expvar selection excluded batch_arena keys, so runtime correlation had blanks.\n\nChange:\n- TreeDB/caching/expvar_stats.go now includes:\n - treedb.cache.batch_arena.\n - treedb.process.batch_arena.\n- Updated TreeDB/caching/expvar_stats_test.go accordingly.\n\nValidation:\n- go test ./TreeDB/caching -count=1\n- go vet ./TreeDB/caching\n- go test ./cmd/unified_bench -count=1\n\nMeasured run with sampler (after this expvar change):\n- run: /home/mikers/.celestia-app-mainnet-treedb-20260315083616 (wal_on_fast)\n - duration_seconds=461\n - max_rss_kb=13135912\n - end_app_bytes=5583231489\n- sampler: /tmp/celestia_stats_expvar_batch_1773599746.tsv\n - max treedb.process.batch_arena.retained_bytes_estimate = 201326592\n - max treedb.process.batch_arena.retained_bytes_global_max_estimate = 201326592\n - max treedb.process.batch_arena.pool_bytes_estimate = 48758784\n - treedb.process.batch_arena.leased_bytes remained 0 in samples\n\nInterpretation:\n- Batch-arena retained max repeatedly hits the current hard cap (192MiB), which is now directly observable in /debug/vars and can drive the next cap/trim policy iteration with evidence.

Copilot AI review requested due to automatic review settings March 15, 2026 18:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds additional allocator/pooling telemetry to TreeDB’s caching layer and surfaces it through DB.Stats() and unified-bench output, to better diagnose entry-slice allocation churn and related retention behavior.

Changes:

  • Add entry-slice get/put telemetry counters (lease hits, pool hits, fresh allocs, budget drops) and expose them via DB.Stats() under both treedb.cache.* and treedb.process.*.
  • Add additional batch-arena “global max estimate” stats and new flush-merge shadow/applied counters/ratios; include new keys in unified-bench cache-stats allowlist.
  • Expand expvar “treedb” stats filtering to include batch-arena prefixes.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/unified_bench/main.go Allowlist new cache/process stats keys for unified-bench output.
TreeDB/caching/db.go Introduce new telemetry counters and export them via DB.Stats().
TreeDB/caching/expvar_stats.go Expand expvar-published stat prefixes to include batch-arena stats.
TreeDB/caching/expvar_stats_test.go Update expvar selection test expectations for new prefixes/keys.
TreeDB/caching/batch_arena_retained_max_test.go Add tests for new batch-arena max tracking helpers.
TreeDB/internal/valuelog/reader_mmap.go Add synchronous remap-on-out-of-range behavior via tryRefreshMmapRange.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@snissn
Copy link
Copy Markdown
Owner Author

snissn commented Mar 15, 2026

Follow-up patch pushed: 20b88d1f (pressure-aware mutable flush threshold)

What changed

  • Scale effective mutable flush threshold by pool-pressure level:
    • high pressure: base/2
    • critical pressure: base/4 (floored at 32MiB when base > 32MiB)
  • Added stats keys:
    • treedb.cache.mutable_flush_threshold_base_bytes
    • treedb.cache.mutable_flush_threshold_effective_bytes
  • Added unit coverage for threshold scaling in pool_pressure_test.go.

Validation

  • go test ./TreeDB/caching -count=1
  • go test ./cmd/unified_bench -count=1
  • go vet ./TreeDB/caching
  • go vet ./cmd/unified_bench

Serial run_celestia (local, same branch)

  • fast run home: /home/mikers/.celestia-app-mainnet-treedb-20260315092135
    • duration_seconds=285
    • max_rss_kb=9480572
    • end_app_bytes=4581556356
    • du -sb application.db=4581556356
  • wal_on_fast run home: /home/mikers/.celestia-app-mainnet-treedb-20260315092639
    • duration_seconds=305
    • max_rss_kb=9302216
    • end_app_bytes=5107277253
    • du -sb application.db=5107277253

Against immediate pre-change runs (same machine)

  • fast (vs ...091021):
    • wall: 335 -> 285 (-14.9%)
    • max RSS: 12620824 -> 9480572 (-24.9%)
    • end_app_bytes: 5740044390 -> 4581556356 (-20.2%)
  • wal_on_fast (vs ...090100):
    • wall: 471 -> 305 (-35.2%)
    • max RSS: 14225288 -> 9302216 (-34.6%)
    • end_app_bytes: 5955207462 -> 5107277253 (-14.2%)

Notes

  • wal_on_fast baseline run used for comparison had previously shown chunk-67 stall behavior; still, the direction and magnitude here are strong.
  • No correctness errors observed in either run.

@snissn
Copy link
Copy Markdown
Owner Author

snissn commented Mar 16, 2026

Follow-up tuning results (post-99652f34):\n\nKept baseline on this PR remains commit 99652f3 (mergeInternal workload-aware parallel gating).\n\nRejected experiment A (not committed): prefetch only when useParallel in mergeInternal\n- Fast run: /home/mikers/.celestia-app-mainnet-treedb-20260316105654/sync/sync-time.log\n - duration_seconds=303\n - max_rss_kb=10634532\n - end_app_bytes=4951620150\n- wal_on_fast run: /home/mikers/.celestia-app-mainnet-treedb-20260316110213/sync/sync-time.log\n - duration_seconds=337\n - max_rss_kb=11867568\n - end_app_bytes=5374205478\n- Result: regress vs 99652f3 pair; reverted.\n\nRejected experiment B (not committed): raise minParallelOpsPerChild 256 -> 384\n- Fast run: /home/mikers/.celestia-app-mainnet-treedb-20260316110948/sync/sync-time.log\n - duration_seconds=292\n - max_rss_kb=10904920\n - end_app_bytes=5063496393\n- wal_on_fast run: /home/mikers/.celestia-app-mainnet-treedb-20260316111456/sync/sync-time.log\n - duration_seconds=351\n - max_rss_kb=11879656\n - end_app_bytes=5345203500\n- Result: clear regress vs 99652f3 pair; reverted.\n\nReference kept pair for 99652f3:\n- fast: /home/mikers/.celestia-app-mainnet-treedb-20260316104044/sync/sync-time.log\n- wal_on_fast: /home/mikers/.celestia-app-mainnet-treedb-20260316104532/sync/sync-time.log\n\nBranch is clean and unchanged after these probes.

@snissn
Copy link
Copy Markdown
Owner Author

snissn commented Mar 16, 2026

Additional rejected probes (post-99652f34), all reverted:\n\n1) Empty-shard in-place memtable reuse during rotate\n2) Size-classed append-only entry-slice pool\n3) Applying appendOnlyMemtableCapacityHint inside newMutableMemtableWithCapacityMode\n\nUnified-bench evidence:\n- /home/mikers/tmp/perf-1773696631 (size-class pool)\n- /home/mikers/tmp/perf-1773696940 (empty-shard reuse path)\n- /home/mikers/tmp/perf-1773697116 (newMutable capacity-hint path)\n\nResult summary:\n- Target allocator did not improve (getAppendOnlyEntries remained ~70-80MB in batch_write_steady and ~80MB in sequential_write in these probes).\n- Throughput had broad regressions/noise with no compensating alloc win.\n\nBranch has been returned to clean kept state (99652f3 only).

Copilot AI review requested due to automatic review settings March 16, 2026 22:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands TreeDB caching/valuelog observability by adding detailed allocator and memory-pressure telemetry, wiring it through DB.Stats()/expvar, and updating unified-bench’s cache-stats allowlist to display the new counters.

Changes:

  • Add entry-slice get/put telemetry counters (lease vs pool vs fresh alloc, totals + bytes) and export them under both treedb.cache.* and treedb.process.*.
  • Expand memory-pressure/runtime breakdown stats and expose additional batch-arena global-max estimates and flush-merge shadowing/applied counters.
  • Update unified-bench allowlist and add/adjust tests for the new stats and pressure-scaling behavior; also includes some related performance/memory tuning (zipper scratch reuse, valuelog mmap refresh, grouped-frame cache defaults).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cmd/unified_bench/main.go Adds new TreeDB cache/process stat keys to unified-bench output allowlist.
TreeDB/caching/db.go Implements new telemetry counters, pressure-scaled mutable flush threshold, expanded memory stats, and exports new Stats() keys.
TreeDB/caching/expvar_stats.go Expands expvar selection allowlist to include additional cache/process stat prefixes.
TreeDB/caching/expvar_stats_test.go Updates expvar selection/coercion tests for the expanded allowlist.
TreeDB/caching/pool_pressure_test.go Adds tests for pressure snapshot semantics and mutable flush-threshold scaling.
TreeDB/caching/memory_stats_test.go Adds coverage ensuring process memory/runtime breakdown keys are present and parseable.
TreeDB/caching/batch_arena_sizing_test.go Adjusts sizing test to respect current max chunk constraints.
TreeDB/caching/batch_arena_retained_max_test.go Adds tests for new batch-arena global-max tracking helpers.
TreeDB/caching/append_only_hint_test.go Adds test ensuring append-only prealloc hint tracks pressure-scaled mutable threshold.
TreeDB/internal/valuelog/buf.go Adds global grow-buffer telemetry counters and snapshot API.
TreeDB/internal/valuelog/buf_test.go Adds tests validating grow-buffer telemetry deltas.
TreeDB/internal/valuelog/reader_mmap.go Adds synchronous remap refresh on out-of-range mmap misses to reduce ReadAt fallback.
TreeDB/internal/valuelog/manager.go Reduces default grouped-frame cache size/bytes and documents rationale.
TreeDB/internal/memtable/append_only.go Switches entries-slice replacement to pooled get/put helper.
TreeDB/db/api.go Exposes valuelog decode-buffer grow stats on the non-cached DB stats path.
TreeDB/zipper/zipper.go Adds merge-scratch leaf-page scratch reuse and tweaks parallel merge heuristics.
TreeDB/zipper/zipper_test.go Updates tests for new zipper method signatures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ecc4cab7f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d4653e152

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copilot AI review requested due to automatic review settings March 17, 2026 00:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds expanded allocator/memory telemetry (primarily for TreeDB/caching entry-slice and batch-arena behavior) and surfaces it through DB.Stats()/expvar/unified-bench, alongside a handful of memory/IO related tuning and instrumentation improvements across TreeDB components.

Changes:

  • Add new cache/process stats: entry-slice get/put source counters, batch-arena global max/in-flight metrics, flush-merge shadow/applied counters, append-only memtable allocation source counters, and richer process memory breakdown.
  • Export additional telemetry via expvar and include new keys in unified-bench’s cache-stats allowlist.
  • Add supporting behavior/tuning changes and tests (valuelog mmap refresh on out-of-range, grouped-frame cache default reduction, zipper merge scratch buffer reuse, pressure-scaled mutable flush threshold & append-only prealloc alignment).

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cmd/unified_bench/main.go Expands TreeDB cache-stats allowlist to include new telemetry keys.
TreeDB/zipper/zipper.go Adds mergeScratch leaf-page scratch pooling; refines parallel-merge gating; threads scratch context into leaf loads/coalesce.
TreeDB/zipper/zipper_test.go Updates tests for new zipper method signatures.
TreeDB/internal/valuelog/reader_mmap.go Adds synchronous remap refresh on mmap out-of-range misses to reduce ReadAt fallback churn.
TreeDB/internal/valuelog/manager.go Lowers default grouped-frame cache size to reduce retained decoded payloads.
TreeDB/internal/valuelog/buf.go Adds global grow-buffer telemetry counters and snapshot function.
TreeDB/internal/valuelog/buf_test.go Adds unit test validating grow-buffer stats snapshots track deltas.
TreeDB/internal/memtable/append_only.go Routes append-only entry-slice replacement through pooled get/put helpers.
TreeDB/db/api.go Exposes valuelog decode-buffer grow stats via DB.Stats().
TreeDB/caching/db.go Main telemetry additions: entry-slice counters, batch-arena max/in-flight, flush-merge stats, append-only allocation counters; adds pressure-scaled mutable flush threshold and aligns append-only prealloc; adjusts some retention constants.
TreeDB/caching/expvar_stats.go Broadens expvar-exported stat selection (now includes treedb.process.* plus additional prefixes).
TreeDB/caching/expvar_stats_test.go Updates expvar selection/coercion test for new keys.
TreeDB/caching/pool_pressure_test.go Adds tests for unreleased idle heap contribution and mutable flush threshold scaling under pressure.
TreeDB/caching/memory_stats_test.go Adds test ensuring process memory stats include runtime breakdown + process/cache parity checks.
TreeDB/caching/batch_arena_sizing_test.go Adds test for in-flight bytes tracking + adjusts sizing test for max-chunk cap.
TreeDB/caching/batch_arena_retained_max_test.go Adds tests for retained-bytes max tracking helpers.
TreeDB/caching/batch_arena_ownership_test.go Adds tests ensuring BTree steal is suppressed under deferred-pressure and telemetry increments accordingly.
TreeDB/caching/append_only_hint_test.go Adds test that append-only capacity hint tracks pressure-scaled mutable threshold.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e595075b8b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds memory/heap-pressure aware behavior and expanded telemetry across TreeDB’s zipper merge path, value-log mmap reads, and caching layer resource pools to reduce peak RSS and improve observability during restore-like workloads.

Changes:

  • Add a configurable pressure signal to tighten/loosen zipper internal-node parallel merge fan-out, plus scratch-buffer reuse for leaf-page reads.
  • Improve value-log mmap behavior by synchronously refreshing stale/out-of-range mappings on read misses, and reduce default grouped-frame cache retention.
  • Expand caching-layer memory-pressure sampling and stats (process memory breakdown, batch-arena/entry-slice pool metrics, flush-merge metrics), including pressure-scaled mutable flush thresholds and additional tests.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
TreeDB/zipper/zipper.go Adds pressure-aware parallel internal merge gating and leaf-page scratch reuse via mergeScratch.
TreeDB/zipper/zipper_test.go Updates tests for new loadNode / coalesceLeafChildren signatures.
TreeDB/zipper/parallel_policy_test.go Adds unit tests for new parallel-merge threshold logic and pressure normalization.
TreeDB/internal/valuelog/reader_mmap.go Adds synchronous mmap refresh on out-of-range misses to reduce repeated ReadAt fallbacks.
TreeDB/internal/valuelog/manager.go Reduces default grouped-frame cache size/bytes to limit retained decoded payloads.
TreeDB/internal/valuelog/buf.go Adds atomic counters and snapshot API for grow-buffer telemetry.
TreeDB/internal/valuelog/buf_test.go Adds tests validating grow-buffer telemetry deltas.
TreeDB/internal/memtable/append_only.go Routes entry-slice replacement through pooling helpers when resizing on reset.
TreeDB/db/vacuum_online.go Wires DB-provided zipper pressure source into vacuum’s zipper instance.
TreeDB/db/db.go Adds DB API to set zipper pressure source for current and future zippers.
TreeDB/db/api.go Exposes grow-buffer telemetry via DB stats.
TreeDB/caching/zipper_pressure_test.go Tests that WAL-off open wires a zipper pressure source into the backend.
TreeDB/caching/pool_pressure_test.go Adds tests for unreleased idle heap accounting and pressure-scaled flush threshold.
TreeDB/caching/memory_stats_test.go Adds tests ensuring process memory breakdown keys are present and parseable.
TreeDB/caching/expvar_stats.go Expands expvar selection to include broader treedb.process.* and additional cache/process prefixes.
TreeDB/caching/expvar_stats_test.go Updates filter/coercion expectations for newly selected expvar stats.
TreeDB/caching/db.go Implements broader memory-pressure snapshot, pressure-scaled flush threshold, more pool/flush telemetry, and wiring of zipper pressure source in WAL-off mode.
TreeDB/caching/batch_arena_sizing_test.go Adds test for in-flight batch-arena byte tracking and updates sizing expectation for max-chunk.
TreeDB/caching/batch_arena_retained_max_test.go Adds tests for retained/pool/leased max tracking helpers.
TreeDB/caching/batch_arena_ownership_test.go Adds tests ensuring BTree Steal is suppressed under deferred-pressure and counters update.
TreeDB/caching/append_only_hint_test.go Adds test to ensure append-only capacity hints respect pressure-scaled mutable thresholds.
cmd/unified_bench/main.go Extends benchmark output to include new cache stats keys.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

codex added 3 commits March 17, 2026 08:42
…to pr/entry-slice-telemetry-counters

# Conflicts:
#	TreeDB/caching/db.go
Copilot AI review requested due to automatic review settings March 18, 2026 00:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds expanded caching and process telemetry to better understand allocator behavior and memory pressure effects, and exposes these metrics through DB stats and unified-bench reporting.

Changes:

  • Adds new cache/process counters (entry-slice lease/pool/fresh paths, flush-merge shadowing/applies, batch arena global/in-flight maxima, mmap refresh behavior, decode-buffer grow stats).
  • Introduces heap-pressure-aware policies (mutable flush threshold scaling, parallel internal-merge gating via configurable pressure source).
  • Updates unified-bench cache-stats allowlist and adds targeted unit tests for the new policies/telemetry.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cmd/unified_bench/main.go Extends cache-stats allowlist to include new telemetry keys.
TreeDB/zipper/zipper.go Adds scratch pooling + pressure-aware parallel merge gating and wiring hooks.
TreeDB/zipper/zipper_test.go Updates tests for new zipper method signatures.
TreeDB/zipper/parallel_policy_test.go Adds tests for new parallel-merge threshold policy helpers.
TreeDB/internal/valuelog/reader_mmap.go Adds synchronous mmap refresh path on out-of-range reads.
TreeDB/internal/valuelog/manager.go Adjusts default grouped-frame cache sizing.
TreeDB/internal/valuelog/buf.go Adds decode buffer grow telemetry counters.
TreeDB/internal/valuelog/buf_test.go Tests grow-buffer telemetry accounting.
TreeDB/internal/memtable/append_only.go Centralizes entry-slice replacement via pooling helpers.
TreeDB/db/vacuum_online.go Wires DB pressure signal into vacuum zipper instances.
TreeDB/db/db.go Adds DB API to set zipper parallel merge pressure source.
TreeDB/db/api.go Exposes valuelog grow-buffer telemetry in DB stats.
TreeDB/caching/db.go Adds extensive cache/process telemetry + pressure-aware policies and wiring.
TreeDB/caching/expvar_stats.go Broadens expvar-export selection to include new metric families.
TreeDB/caching/expvar_stats_test.go Updates expvar selection test for expanded output set.
TreeDB/caching/pool_pressure_test.go Adds tests for pressure calculations + mutable-threshold scaling.
TreeDB/caching/memory_stats_test.go Adds tests validating new process memory stats breakdown.
TreeDB/caching/zipper_pressure_test.go Tests wiring of zipper pressure source under DisableWAL.
TreeDB/caching/batch_arena_sizing_test.go Adds test for in-flight bytes lifecycle tracking.
TreeDB/caching/batch_arena_retained_max_test.go Adds tests for retained/max tracking helpers.
TreeDB/caching/batch_arena_ownership_test.go Tests deferred-pressure steal suppression logic/telemetry.
TreeDB/caching/append_only_hint_test.go Tests append-only capacity hint under pressure-scaled thresholds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32220a460a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8726ba6da5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants