valuelog: remove small decode-scratch boxing churn#833
valuelog: remove small decode-scratch boxing churn#833snissn wants to merge 7 commits intopr/vlog-sealed-lazy-mmap-stat-churnfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Eliminates sync.Pool interface-boxing allocations on the hot decode-scratch path by wrapping []byte in a typed *decodeScratchHolder struct. Also changes decodeRecordTo to accept *[HeaderSize]byte to avoid slice escape.
Changes:
- Replace direct
sync.Poolof[]bytewith a typed*decodeScratchHolderpool to avoid interface-boxing allocations - Change
decodeRecordTosignature fromheader []bytetoheader *[HeaderSize]byteto prevent slice escape on the hot read path - Add
TestDecodeScratchPool_ReusesSmallBuffer_NoAllocsAfterWarmupregression test
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| TreeDB/internal/valuelog/reader.go | Typed holder pool for small scratch buffers; fixed-size array pointer for header; min cap hint |
| TreeDB/internal/valuelog/decode_scratch_pool_test.go | Allocation regression test for small buffer reuse |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
Follow-up pushed: Reason:
Validation run:
Throughput deltas (TreeDB):
Alloc deltas:
Current random-read alloc profiles are now dominated by profiling/runtime artifacts ( |
|
Additional validation on this PR head ( fast
wal_on_fast
Diagnostics artifacts are present in each run’s |
|
Pushed
Local validation:
|
There was a problem hiding this comment.
Pull request overview
This PR eliminates sync.Pool interface-boxing overhead for small decode scratch buffers in the valuelog read path by introducing a typed decodeScratchHolder wrapper pool pattern. It also pools the fixed-size header array used in ReadAtWithDictTo and relaxes a flaky drain-budget test assertion.
Changes:
- Replace direct
sync.Pool[]byteusage with a typed*decodeScratchHoldertwo-pool pattern to avoid interface boxing allocations on the hot read path - Pool the
[HeaderSize]byteheader inReadAtWithDictToand changedecodeRecordToto accept*[HeaderSize]byteto avoid slice escape - Add Windows skip guards for mmap tests and relax
TestDrainBatchArenaPoolToTargetBytestolerance
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| TreeDB/internal/valuelog/reader.go | Typed holder pool for decode scratch, header scratch pool, decodeRecordTo signature change |
| TreeDB/internal/valuelog/decode_scratch_pool_test.go | New zero-alloc regression test for small buffer reuse |
| TreeDB/internal/valuelog/manager_test.go | Windows skip guards for mmap tests |
| TreeDB/caching/batch_arena_budget_test.go | Relax drain assertion to tolerate one class of residual |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR reduces allocation churn in the internal/valuelog random-read hot path by eliminating []byte interface-boxing in the small decode-scratch sync.Pool and by keeping the record header on a fixed-size pointer to avoid slice escapes.
Changes:
- Replace small decode-scratch pooling from
sync.Poolof[]byteto a typed holder approach to avoid interface boxing allocations. - Introduce a pooled
*[HeaderSize]byteheader scratch path and updatedecodeRecordToto accept a fixed-size header pointer. - Add an allocation regression test for small scratch reuse; relax a batch-arena pool accounting test expectation to be resilient across
sync.Poolimplementations/visibility.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| TreeDB/internal/valuelog/reader.go | Reworks decode scratch pooling to avoid []byte boxing and pools fixed-size header scratch; updates decodeRecordTo signature accordingly. |
| TreeDB/internal/valuelog/manager_test.go | Skips mmap-dependent tests on Windows where mmap is not supported. |
| TreeDB/internal/valuelog/decode_scratch_pool_test.go | Adds a small-buffer allocation regression test using testing.AllocsPerRun. |
| TreeDB/caching/batch_arena_budget_test.go | Allows one class-sized residual in drain accounting to reduce platform/runtime sensitivity. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…ecode-scratch-holder-pool
…ecode-scratch-holder-pool
…ecode-scratch-holder-pool
There was a problem hiding this comment.
Pull request overview
This PR reduces allocation churn in the valuelog read/decode hot paths by avoiding []byte interface boxing in sync.Pool and by pooling a fixed-size header scratch buffer.
Changes:
- Replace small decode-scratch
[]bytepooling with a typed holder-based pooling strategy to avoid interface boxing allocations. - Pool
*[HeaderSize]byteforReadAtWithDictToheader reads to avoid slice escape/allocation in the hot path. - Add an allocation regression test for small-buffer decode scratch reuse; adjust mmap-related tests to skip on Windows; loosen a batch-arena budget assertion to account for
sync.Poolvariability.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| TreeDB/internal/valuelog/reader.go | Introduces holder-based scratch pool and header scratch pooling; updates decode API to accept *[HeaderSize]byte. |
| TreeDB/internal/valuelog/manager_test.go | Skips mmap budget tests on Windows where mmap isn’t supported. |
| TreeDB/internal/valuelog/decode_scratch_pool_test.go | Adds an allocation regression test for small decode scratch pool reuse. |
| TreeDB/caching/batch_arena_budget_test.go | Relaxes post-drain accounting assertion to tolerate class-sized residuals. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…ecode-scratch-holder-pool
Summary
[]byteinterface boxing churn by replacing directsync.Pool[]byteputs with a typed holder poolTestDecodeScratchPool_ReusesSmallBuffer_NoAllocsAfterWarmup)ReadAtWithDictToheader path on a fixed-size pointer (*[HeaderSize]byte) to avoid slice escape in the hot pathWhy
alloc_objectsandalloc_spacefor random-read workloads were still heavily split between:ReadAtWithDictToputDecodeScratchThe
putDecodeScratchside was largely interface-boxing churn fromsync.Pool.Put([]byte)on the hot path.Validation
go test ./TreeDB/internal/valuelog -count=1go vet ./TreeDB/internal/valuelogGOWORK=off make unified-benchtime ./bin/unified-bench -dbs treedb -profile fast -keys 500000 -progress=false -treedb-index-outer-leaves-in-vlog=true -valsize=100 -treedb-force-value-pointers=false -profile-dir=/home/mikers/tmp/perf-1773576977 -checkpoint-between-testsBaseline comparison run (same branch baseline before this PR):
/home/mikers/tmp/perf-1773575465Throughput (TreeDB)
234,557 -> 236,109(+0.66%)1,667,356 -> 1,692,842(+1.53%)1,436,009 -> 1,512,065(+5.30%)1,186,844 -> 1,200,989(+1.19%)Allocation reduction (random_read_parallel)
11,895,488 -> 5,755,107(-51.62%)285.11MB -> 142.59MB(-49.99%)Allocation reduction (random_read)
967,946 -> 642,231(-33.65%)35,931.20kB -> 26,671.85kB(-25.77%)The
putDecodeScratchhotspot drops out of random-read top allocators;ReadAtWithDictToremains the dominant read-path allocator.