valuelog: disable current writable mmaps by default by snissn · Pull Request #848 · snissn/gomap

snissn · 2026-03-25T19:52:09Z

Summary

disable persistent mmap growth for current-writable value-log segments by default
add per-instance TreeDB expvar memory visibility so multi-DB app processes can attribute RSS correctly
keep current-writable mmap behavior available behind TREEDB_VLOG_ENABLE_CURRENT_WRITABLE_MMAP=1

Why

fast profiling showed the real RSS regression was not just heap pressure. The application.db TreeDB instance was accumulating large current-writable value-log remap state, with dead mmap bytes growing into the tens of GB during restore. Disabling persistent current-writable mmaps removes that churn and materially lowers peak RSS.

Validation

GOWORK=off go test ./TreeDB/caching ./TreeDB/internal/valuelog -count=1
GOWORK=off go vet ./TreeDB/caching ./TreeDB/internal/valuelog
real fast A/B:
- before: max RSS about 18.54M kB
- after: max RSS 10.85M kB
- reduction: about 7.33 GiB / 41.46%

Copilot

Pull request overview

Disables persistent mmap growth for current-writable value-log segments by default (opt-in via env var), while improving observability for RSS and value-log mmap residency (including multi-DB expvar attribution) and tightening correctness around current-segment read visibility and dict training.

Changes:

Disable persistent current-writable value-log mmaps by default and add a current-writable read barrier to flush buffered tails before backend reads.
Add process RSS/HWM + peak tracking and expose per-instance expvar stats to attribute memory usage in multi-DB processes.
Expand Zstd dict autotune candidates / stabilize dict training offsets; add additional corruption/staleness regression tests.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
cmd/unified_bench/main.go	Include new process/mmap memory stats keys in benchmark stats output.
TreeDB/internal/valuelog/writer.go	Add `PendingBytes()` to report buffered-but-unflushed writer tail.
TreeDB/internal/valuelog/reader_mmap.go	Gate persistent mmaps for current-writable segments behind `TREEDB_VLOG_ENABLE_CURRENT_WRITABLE_MMAP`.
TreeDB/internal/valuelog/mmap_safety_test.go	Force-enable current-writable mmap for mmap safety regression tests.
TreeDB/internal/valuelog/manager_test.go	Add test ensuring current-writable reads fall back safely without persistent mmap.
TreeDB/internal/valuelog/manager.go	Add current-writable read barrier hook and invoke it on reads.
TreeDB/internal/valuelog/dict_invalid_offsets_test.go	Add regression coverage for invalid repeat-offset dictionaries from small histories.
TreeDB/internal/valuelog/dict_compressibility_bench_test.go	Use explicit non-zero repeat offsets to avoid invalid dictionaries in benches.
TreeDB/internal/valuelog/dict_bench_helper_test.go	Add tests asserting bench dict builders use safe initial offsets.
TreeDB/internal/valuelog/dict_autotune_bench_test.go	Use explicit offsets when building dicts with history in benches.
TreeDB/internal/valuelog/autotune_options_test.go	Update tests to validate expanded default candidate sets.
TreeDB/internal/valuelog/autotune_options.go	Expand default candidate history/dict sizes for autotune.
TreeDB/internal/memtable/hash_sorted_corruption_test.go	Add regression tests for stale-index recovery and rebuild correctness.
TreeDB/internal/memtable/hash_sorted.go	Harden read paths against stale indices; improve frozen-index validation.
TreeDB/internal/compression/trainer_test.go	Add coverage for selecting expanded history/dict candidate sizes.
TreeDB/internal/compression/trainer.go	Support separate history/dict candidates; shape/validate dict sizes; cap max history.
TreeDB/db/leaf_page_log.go	Promote registered segments to current-writable and expose a read-barrier setter.
TreeDB/cmd/vlog_dict_realdata/main_dict_test.go	Add harness test ensuring fixed-size trainer uses safe offsets.
TreeDB/cmd/vlog_dict_realdata/main.go	Plumb `CandidateDictBytes` and use explicit offsets for dict training.
TreeDB/caching/vlog_dict.go	Pass dict-size candidates into the compression trainer.
TreeDB/caching/vlog_current_segment_readbarrier_test.go	Add regression tests ensuring buffered tails are flushed before reads.
TreeDB/caching/vlog_autotune_bench.go	Update bench defaults for expanded history/dict candidate sizes.
TreeDB/caching/snapshot.go	Switch snapshot backend reads to use the new `flushValueLogForBackendRead()` barrier.
TreeDB/caching/process_memory_other.go	Stub RSS reader for non-Linux platforms.
TreeDB/caching/process_memory_linux.go	Implement RSS/HWM parsing via `/proc/self/status` on Linux.
TreeDB/caching/memory_stats_test.go	Add assertions for new process RSS/peak and backend mmap attribution stats.
TreeDB/caching/expvar_stats_test.go	Extend expvar selection tests and add multi-instance expvar payload coverage.
TreeDB/caching/expvar_stats.go	Track all open DB instances for expvar and include backend mmap/identity families.
TreeDB/caching/db.go	Add process memory sampler + peak stats, backend/cache mmap aggregation, and stronger backend-read flushing barrier.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TreeDB/caching/db.go

TreeDB/internal/valuelog/manager.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8fa4008cf8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

TreeDB/caching/expvar_stats.go

Copilot

Pull request overview

Copilot reviewed 42 out of 42 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TreeDB/caching/db.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78c4a72e41

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

TreeDB/caching/db.go

TreeDB/caching/expvar_stats.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 717c94c47a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

TreeDB/internal/valuelog/manager.go

snissn · 2026-03-26T15:59:27Z

Update: landed rewrite/dict optimization in fc39d8af.

What changed

vlog-rewrite now re-encodes grouped large block frames into dict frames when a valid dict is available, not only single-record block frames.
rewrite writer batches dict frames during rewrite (high K), reducing per-frame overhead in final WAL segments.
treemap vlog-rewrite now wires dictdb lookup so offline rewrite can decode/re-encode dict-aware frames.
treemap vlog-audit gained -frame-stats mode/length breakdown used for before/after attribution.

Validation

tests:
- go test ./TreeDB/db ./TreeDB/cmd/treemap -count=1 ✅
benchmark run dir:
- $(cat /tmp/treedb_comp_run_final2_last_path)
baseline for comparison:
- /tmp/treedb_comp_run_profilemedium_xezyaU

Key numbers (post vlog-rewrite, maindb/wal)

fast:
- baseline: 52,020,322 bytes, gzip ratio 0.483236
- new: 38,994,489 bytes, gzip ratio 0.643835
- delta: -25.04% bytes
wal_on_fast:
- baseline: 53,258,105 bytes, gzip ratio 0.461424
- new: 38,969,583 bytes, gzip ratio 0.676189
- delta: -26.83% bytes

Frame-level attribution (post-rewrite audits)

grouped frame count collapsed from ~315k/324k to ~26k/27k (fast/wal_on_fast) by rebatching dict frames at rewrite time.
grouped dict stored payload reduced materially while keeping all values readable after reopen.

Copilot

Pull request overview

Copilot reviewed 49 out of 49 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TreeDB/cmd/treemap/vlog_audit.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc39d8afac

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

TreeDB/cmd/treemap/vlog_audit.go

snissn · 2026-03-26T16:47:11Z

Compression A/B update: raised rewrite dict batch cap from k=32 to k=64 (commit 2af5db3), validated on same-input clones for both profiles.\n\nControlled same-input rewrite deltas:\n\n| profile | pre wal bytes | k32 wal bytes | k64 wal bytes | k64 vs k32 | k32 rewrite (s/rss_kb) | k64 rewrite (s/rss_kb) | k32 gzip ratio | k64 gzip ratio | k32 dict frames | k64 dict frames |\n|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|\n| fast | 61,226,647 | 38,993,579 | 38,420,980 | -572,599 (-1.47%) | 53.73 / 231,040 | 51.83 / 220,056 | 0.644889 | 0.658781 | 20,510 | 10,741 |\n| wal_on_fast | 124,995,002 | 38,978,062 | 38,445,903 | -532,159 (-1.37%) | 30.73 / 221,716 | 30.02 / 222,440 | 0.673229 | 0.688458 | 21,073 | 11,746 |\n\nNotes:\n- k64 consistently reduces post-rewrite on-disk bytes on identical input for both profiles.\n- rewrite wall is flat/slightly better; rewrite max_rss unchanged.\n- grouped_dict frame count drops ~48%, consistent with lower frame metadata overhead.\n\nArtifacts under /tmp/treedb_rewrite_ab_20260326062636 on bench host.

snissn · 2026-03-26T17:00:19Z

Opened follow-up stacked PR for compression-sanity instrumentation: #850 (base: pr/rss-postfix-polish).\n\nThis keeps #848 focused while we iterate on targeted 4KiB / 40-48KiB frame-mode + stored-byte breakdowns in the next PR.

Copilot

Pull request overview

Copilot reviewed 50 out of 50 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

TreeDB/internal/memtable/hash_sorted.go:468

Get returns the same tuple in both the pointer and non-pointer branches here, so the if ent.flags&node.FlagPointer != 0 check is redundant. Consider returning directly once after unlocking to keep this path easier to read (and consistent with GetEntry, which has distinct pointer/non-pointer behavior).

	if ent.flags&node.FlagPointer != 0 {
		return ent.value, ent.flags&node.FlagTombstone != 0, true
	}
	return ent.value, ent.flags&node.FlagTombstone != 0, true

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TreeDB/caching/vlog_compression_selector.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 47470e4ab2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

TreeDB/caching/vlog_dict.go

TreeDB/db/vlog_rewrite.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 51a41f6cd9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

TreeDB/caching/db.go

Copilot

Pull request overview

Copilot reviewed 52 out of 52 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TreeDB/db/vlog_rewrite.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe789f9ea9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

TreeDB/caching/vlog_dict.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1edf57edfc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

TreeDB/caching/db.go

Copilot

Pull request overview

Copilot reviewed 52 out of 52 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ab9c0787e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T19:38:09Z

TreeDB/cmd/wal_classify/main.go

+		valueLen := binary.LittleEndian.Uint32(header[16:20])
+		out.rawRecords++
+		out.rawPayloadBytes += int64(valueLen)
+		if valueLen > uint32(cap(payload)) {
+			payload = make([]byte, valueLen)


Bound frame length before allocating payload buffer

scanFrameMeta trusts the on-disk valueLen and uses it directly in make([]byte, valueLen). For corrupted or adversarial value-log input, a forged header can set valueLen near 4 GiB, causing wal_classify -frame-meta to attempt a huge allocation and OOM instead of reporting corruption. Add a maximum-length guard (for example limits.MaxRecordSize - valuelog.HeaderSize) before growing payload.

Useful? React with 👍 / 👎.

snissn added 4 commits March 24, 2026 14:43

dict: align helper training with runtime offsets

c6b76dc

dict: expand autotune candidate sizes

9589e27

caching: flush current vlog tails before backend reads

5540405

valuelog: disable current writable mmaps by default

e3c3b14

Copilot AI review requested due to automatic review settings March 25, 2026 19:52

Copilot started reviewing on behalf of snissn March 25, 2026 19:52 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

TreeDB/caching/db.go Outdated Show resolved Hide resolved

TreeDB/caching/db.go Show resolved Hide resolved

TreeDB/internal/valuelog/manager.go Show resolved Hide resolved

caching: add memory forensics and dict selector signal tuning

8fa4008

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

TreeDB/caching/expvar_stats.go Outdated Show resolved Hide resolved

treedb: bias large payloads toward dict when signal is strong

78c4a72

Copilot AI review requested due to automatic review settings March 26, 2026 09:58

Copilot started reviewing on behalf of snissn March 26, 2026 09:59 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

TreeDB/caching/db.go Outdated Show resolved Hide resolved

TreeDB/caching/db.go Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

TreeDB/caching/db.go Outdated Show resolved Hide resolved

TreeDB/caching/expvar_stats.go Outdated Show resolved Hide resolved

TreeDB: clamp dict probe interval for large payload holds

717c94c

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

TreeDB/internal/valuelog/manager.go Show resolved Hide resolved

snissn added 2 commits March 26, 2026 04:54

unified_bench: enable medium vlog autotune in fast profiles

64e6df3

treedb: improve vlog rewrite dict re-encoding and frame audit

fc39d8a

Copilot AI review requested due to automatic review settings March 26, 2026 15:59

Copilot started reviewing on behalf of snissn March 26, 2026 15:59 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

TreeDB/cmd/treemap/vlog_audit.go Outdated Show resolved Hide resolved

TreeDB/cmd/treemap/vlog_audit.go Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

TreeDB/cmd/treemap/vlog_audit.go Outdated Show resolved Hide resolved

TreeDB: increase rewrite dict batch cap to 64

2af5db3

caching: fix CI flake and address PR848 review feedback

47470e4

Copilot AI review requested due to automatic review settings March 26, 2026 18:15

Copilot started reviewing on behalf of snissn March 26, 2026 18:16 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

TreeDB/caching/vlog_compression_selector.go Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

TreeDB/caching/vlog_dict.go Outdated Show resolved Hide resolved

TreeDB/db/vlog_rewrite.go Show resolved Hide resolved

snissn added 2 commits March 26, 2026 08:34

Fix PR848 CI blockers: Windows open-failure cleanup and perf gate

51a41f6

Stabilize Windows rewrite-cancel backoff scheduler test

fe789f9

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

TreeDB/caching/db.go Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 26, 2026 18:44

Copilot started reviewing on behalf of snissn March 26, 2026 18:44 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

TreeDB/db/vlog_rewrite.go Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

TreeDB/caching/vlog_dict.go Show resolved Hide resolved

Address remaining PR848 review threads and regressions

1edf57e

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

TreeDB/caching/db.go Show resolved Hide resolved

Fix shared-backend read-barrier dispatch across DB owners

6ab9c07

Copilot AI review requested due to automatic review settings March 26, 2026 19:26

Copilot started reviewing on behalf of snissn March 26, 2026 19:26 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

snissn merged commit 1c46492 into main Mar 26, 2026
21 checks passed

Conversation

snissn commented Mar 25, 2026

Summary

Why

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

snissn commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

snissn commented Mar 26, 2026

Uh oh!

snissn commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment