treemap: add focused vlog compression breakdown instrumentation by snissn · Pull Request #850 · snissn/gomap

snissn · 2026-03-26T16:59:50Z

Summary

extend treemap vlog-audit -frame-stats with stored-byte attribution per record length
add explicit focus breakdowns for the main offender ranges: exact 4KiB and 40-48KiB
report per-focus mode composition (grouped_dict, grouped_block_*, grouped_raw/raw_ungrouped) with raw/stored bytes and ratio
include stored bytes + ratio in top record-length output to make codec quality visible by length

Why

We need a precise answer to:

which frame modes are actually carrying the 40-48KiB values
how much stored-bytes each mode contributes in those ranges

This removes guesswork before tuning dict coverage/policy for post-rewrite disk size.

Implementation notes

grouped-frame stored bytes are apportioned to subrecords by raw-length share, conserving totals exactly
JSON schema gains additive fields only (*_stored_bytes, *_modes, stored_bytes on top lengths)

Tests

go test ./TreeDB/cmd/treemap -count=1
go test ./TreeDB/db -run TestValueLogRewrite -count=1

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 893004475d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T17:04:41Z

TreeDB/cmd/treemap/vlog_audit.go

+		shares[i] = share
+		remaining -= share
+	}
+	shares[len(shares)-1] = remaining


Allocate frame remainder without record-order bias

apportionStoredBytesByRaw gives every leftover byte to the final subrecord, so when storedTotal is small relative to record count (or many shares round to zero), the last record can absorb most/all stored bytes regardless of its size. In mixed-length grouped frames this makes per-length and focus-mode ratios depend on subrecord order instead of payload composition, which can materially skew the new 4KiB and 40–48KiB attribution output used for compression tuning.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR enhances treemap’s value-log (vlog) audit instrumentation to attribute stored bytes (post-compression) to record lengths and to provide focused breakdowns for key offender ranges (exact 4KiB and 40–48KiB), including per-mode composition and ratios.

Changes:

Extend record-length aggregation to track stored_bytes alongside raw bytes and emit ratio output.
Add focused counters and per-mode composition maps for 4KiB and 40–48KiB ranges (raw/stored/ratio).
Implement stored-bytes apportioning for grouped frames across subrecords by raw-length share, plus tests for conservation and focus breakdowns.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
TreeDB/cmd/treemap/vlog_audit.go	Adds stored-bytes attribution, focus-range mode breakdowns, printing helpers, and apportioning logic for grouped frames.
TreeDB/cmd/treemap/vlog_audit_test.go	Updates existing expectations to include stored bytes and adds tests for focus breakdowns and apportioning invariants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T17:05:32Z

TreeDB/cmd/treemap/vlog_audit_test.go

 	}
 }

+func TestScanValueLogFrames_FocusModeBreakdownForUngroupedLengths(t *testing.T) {


Test name says "UngroupedLengths", but the value-log writer path used here writes grouped frames (k=1/2) and the assertions explicitly allow grouped_raw. Consider renaming the test to reflect that it validates focus breakdown by record length regardless of grouping, to avoid misleading future readers.

Suggested change

func TestScanValueLogFrames_FocusModeBreakdownForUngroupedLengths(t *testing.T) {

func TestScanValueLogFrames_FocusModeBreakdownByRecordLength(t *testing.T) {

snissn · 2026-03-26T17:42:35Z

Next Steps
I recommend we now use #850 instrumentation to run one controlled fast + wal_on_fast post-rewrite capture, then pick the
next policy/codec change specifically for the dominant 40–48KiB mode path. Once we have that delta, I’ll implement the
first targeted compression policy change in #850.
hecking local benchmark artifacts for the dict-setting sweeps you referenced
(including dict_zstd_fastest_noentropy_k1) and separating that from what’s currently exposed as runtime knobs in code.

treemap: add focus compression breakdown in vlog-audit

8930044

Copilot AI review requested due to automatic review settings March 26, 2026 16:59

snissn mentioned this pull request Mar 26, 2026

valuelog: disable current writable mmaps by default #848

Merged

Copilot started reviewing on behalf of snissn March 26, 2026 17:00 View session

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

Copilot AI reviewed Mar 26, 2026

View reviewed changes

snissn mentioned this pull request Mar 26, 2026

compression: combine PR849 write-mode stats with PR850 focused vlog audit #851

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

treemap: add focused vlog compression breakdown instrumentation#850

treemap: add focused vlog compression breakdown instrumentation#850
snissn wants to merge 1 commit intopr/rss-postfix-polishfrom
pr/compression-sanity-followup

snissn commented Mar 26, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

snissn commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	func TestScanValueLogFrames_FocusModeBreakdownForUngroupedLengths(t *testing.T) {
	func TestScanValueLogFrames_FocusModeBreakdownByRecordLength(t *testing.T) {

Conversation

snissn commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Implementation notes

Tests

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

snissn commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

snissn commented Mar 26, 2026 •

edited

Loading