treemap: add focused vlog compression breakdown instrumentation#850
treemap: add focused vlog compression breakdown instrumentation#850snissn wants to merge 1 commit intopr/rss-postfix-polishfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 893004475d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| shares[i] = share | ||
| remaining -= share | ||
| } | ||
| shares[len(shares)-1] = remaining |
There was a problem hiding this comment.
Allocate frame remainder without record-order bias
apportionStoredBytesByRaw gives every leftover byte to the final subrecord, so when storedTotal is small relative to record count (or many shares round to zero), the last record can absorb most/all stored bytes regardless of its size. In mixed-length grouped frames this makes per-length and focus-mode ratios depend on subrecord order instead of payload composition, which can materially skew the new 4KiB and 40–48KiB attribution output used for compression tuning.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR enhances treemap’s value-log (vlog) audit instrumentation to attribute stored bytes (post-compression) to record lengths and to provide focused breakdowns for key offender ranges (exact 4KiB and 40–48KiB), including per-mode composition and ratios.
Changes:
- Extend record-length aggregation to track
stored_bytesalongside raw bytes and emit ratio output. - Add focused counters and per-mode composition maps for 4KiB and 40–48KiB ranges (raw/stored/ratio).
- Implement stored-bytes apportioning for grouped frames across subrecords by raw-length share, plus tests for conservation and focus breakdowns.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| TreeDB/cmd/treemap/vlog_audit.go | Adds stored-bytes attribution, focus-range mode breakdowns, printing helpers, and apportioning logic for grouped frames. |
| TreeDB/cmd/treemap/vlog_audit_test.go | Updates existing expectations to include stored bytes and adds tests for focus breakdowns and apportioning invariants. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } | ||
|
|
||
| func TestScanValueLogFrames_FocusModeBreakdownForUngroupedLengths(t *testing.T) { |
There was a problem hiding this comment.
Test name says "UngroupedLengths", but the value-log writer path used here writes grouped frames (k=1/2) and the assertions explicitly allow grouped_raw. Consider renaming the test to reflect that it validates focus breakdown by record length regardless of grouping, to avoid misleading future readers.
| func TestScanValueLogFrames_FocusModeBreakdownForUngroupedLengths(t *testing.T) { | |
| func TestScanValueLogFrames_FocusModeBreakdownByRecordLength(t *testing.T) { |
|
Next Steps |
Summary
treemap vlog-audit -frame-statswith stored-byte attribution per record lengthgrouped_dict,grouped_block_*,grouped_raw/raw_ungrouped) with raw/stored bytes and ratioWhy
We need a precise answer to:
This removes guesswork before tuning dict coverage/policy for post-rewrite disk size.
Implementation notes
*_stored_bytes,*_modes,stored_byteson top lengths)Tests
go test ./TreeDB/cmd/treemap -count=1go test ./TreeDB/db -run TestValueLogRewrite -count=1