caching: expose vlog write-mode compression stats#849
Conversation
There was a problem hiding this comment.
Pull request overview
This PR expands TreeDB’s observability around value-log compression and memory usage, with supporting tests and expvar export updates, to help validate and tune compression policy based on what is actually written to the value log.
Changes:
- Add per-write-mode (off/block/dict) value-log byte/frame counters, including payload-size bucket breakdowns, and expose them via
DB.Stats()+ expvar filtering. - Add process memory/RSS + combined backend/cache vlog-mmap residency/peak stats, sampled periodically.
- Add a “current-writable read barrier” to flush buffered value-log tails before backend/internal reads, plus related mmap safety toggles and tests.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/unified_bench/main.go | Adds the new memory stat keys to the unified bench stats allowlist. |
| TreeDB/internal/valuelog/writer.go | Adds PendingBytes() to support detecting buffered tails without relying on a dirty bit. |
| TreeDB/internal/valuelog/reader_mmap.go | Adds env-gated support for current-writable mmap behavior and adjusts mapping enablement logic. |
| TreeDB/internal/valuelog/mmap_safety_test.go | Ensures mmap safety tests run with current-writable mmap enabled. |
| TreeDB/internal/valuelog/manager.go | Adds current-writable read barrier callback and enforces it on read paths. |
| TreeDB/internal/valuelog/manager_test.go | Tests fallback behavior when current-writable mmap is disabled. |
| TreeDB/internal/valuelog/dict_invalid_offsets_test.go | Adds regression tests around invalid zstd dictionary offsets and debug output. |
| TreeDB/internal/valuelog/dict_compressibility_bench_test.go | Aligns bench dict-building with runtime trainer offsets behavior. |
| TreeDB/internal/valuelog/dict_bench_helper_test.go | Adds tests ensuring bench dict helpers avoid invalid dictionaries. |
| TreeDB/internal/valuelog/dict_autotune_bench_test.go | Aligns autotune bench dict training options with runtime trainer behavior. |
| TreeDB/internal/valuelog/autotune_options_test.go | Updates tests to validate new default candidate history/dict sizes. |
| TreeDB/internal/valuelog/autotune_options.go | Expands default autotune candidate history/dict size sets. |
| TreeDB/internal/memtable/hash_sorted_corruption_test.go | Adds additional corruption recovery tests for HashSorted’s index/map consistency. |
| TreeDB/internal/memtable/hash_sorted.go | Improves stale-index handling on reads and strengthens frozen-index validation. |
| TreeDB/internal/compression/trainer_test.go | Adds test coverage for selecting expanded dict-size candidates. |
| TreeDB/internal/compression/trainer.go | Adds dict-size candidate support, dict shaping/validation, and new max-history cap. |
| TreeDB/db/leaf_page_log.go | Promotes registered vlog segments to current-writable and exposes a backend read-barrier setter. |
| TreeDB/cmd/vlog_dict_realdata/main_dict_test.go | Adds a focused dict-training test for the realdata harness. |
| TreeDB/cmd/vlog_dict_realdata/main.go | Wires CandidateDictBytes and aligns BuildDict offsets with runtime. |
| TreeDB/caching/vlog_dict.go | Passes CandidateDictBytes into the compression trainer. |
| TreeDB/caching/vlog_current_segment_readbarrier_test.go | Adds tests reproducing and preventing buffered-tail visibility holes. |
| TreeDB/caching/vlog_compression_selector_test.go | Adds tests for new per-write-mode stats and bucketed breakdown. |
| TreeDB/caching/vlog_compression_selector.go | Implements write-mode + payload-bucket observation and snapshotting. |
| TreeDB/caching/vlog_autotune_bench.go | Updates bench autotune defaults to the new candidate sets. |
| TreeDB/caching/snapshot.go | Switches snapshot backend reads to use the new flush barrier method. |
| TreeDB/caching/process_memory_other.go | Non-linux stub for RSS sampling. |
| TreeDB/caching/process_memory_linux.go | Linux RSS/HWM sampling via /proc/self/status. |
| TreeDB/caching/memory_stats_test.go | Adds tests for new process RSS and backend vlog-mmap contribution stats. |
| TreeDB/caching/lane.go | Adds per-lane write-mode stats accumulators. |
| TreeDB/caching/expvar_stats_test.go | Expands expvar filter/coercion tests and adds multi-instance expvar test. |
| TreeDB/caching/expvar_stats.go | Adds multi-instance expvar export and expands allowed stat families. |
| TreeDB/caching/db.go | Adds process memory sampler/peaks, new vlog-mmap breakdown stats, write-mode stats export, and read barrier wiring. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cacheMmap := cacheVlogMmapStatsSnapshot(db.valueLogReader) | ||
| backendMmap := backendVlogMmapStatsSnapshot(db.backend.Stats()) | ||
| updateAtomicMax(&db.processPeakVlogMmapActiveBytes, cacheMmap.activeBytes+backendMmap.activeBytes) |
There was a problem hiding this comment.
sampleProcessMemoryPeaks() calls db.backend.Stats() even when invoked from DB.Stats(), which already fetched stats := db.backend.Stats(). This doubles backend stats collection work (and any internal locking/IO) for every Stats/expvar scrape. Consider passing the already-fetched stats map into sampleProcessMemoryPeaks (or splitting it so peak tracking can reuse the same snapshot) to avoid the redundant backend.Stats call.
| key := db.dir | ||
| if key == "" { | ||
| key = fmt.Sprintf("db_%p", db) | ||
| } | ||
| instanceStats := selectTreeDBExpvarStats(db.Stats()) | ||
| instanceStats["treedb.expvar.wal_dir"] = db.dir | ||
| instanceStats["treedb.expvar.is_current"] = current == db | ||
| instances[key] = instanceStats |
There was a problem hiding this comment.
instances is keyed by db.dir, so opening multiple DB instances pointing at the same directory will cause later entries to overwrite earlier ones (silently dropping instances from expvar output). Consider using a collision-proof key (e.g., include the pointer address or a monotonic instance ID) even when db.dir is non-empty, and/or make the value a list when duplicates occur.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b74dee7ca1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } else if len(dict) < dictBytes { | ||
| shaped = make([]byte, dictBytes) | ||
| copy(shaped, dict) |
There was a problem hiding this comment.
Cap dictionary candidate sizes before shaping dicts
CandidateDictBytes entries are now accepted without any upper bound, and shapeAndValidateDict allocates make([]byte, dictBytes) when a candidate is larger than the trained dict. A misconfigured candidate size (for example, an MB/GB typo in autotune options) will allocate very large buffers for every history candidate during training, which can trigger OOMs or severe GC pauses. Please clamp dict candidates to a sane maximum (similar to the history-size cap) before this allocation path.
Useful? React with 👍 / 👎.
Summary
Why
We need direct evidence for which value-log bytes are actually written as dict vs block vs off, and in which payload ranges, before changing compression policy.
Validation