feat(#141): ByteBuffer — Massive Allocation Waste on Hot Serialization Path#155
Conversation
55e38be to
6fca7bf
Compare
ByteBuffer Baseline Benchmark Results (Before
|
| Method | Runtime | Mean | Error | StdDev | Op/s | Gen0 | Allocated |
|---|---|---|---|---|---|---|---|
| PutLong | .NET 8.0 | 4,569.4 ns | 89.39 ns | 131.03 ns | 218,848.9 | 3.0594 | 32000 B |
| PutLong | .NET 9.0 | 4,641.6 ns | 46.39 ns | 41.12 ns | 215,443.0 | 3.0594 | 32000 B |
| PutLong | .NET 10.0 | 3,861.4 ns | 71.81 ns | 59.96 ns | 258,970.4 | 3.0594 | 32000 B |
| GetLong | .NET 8.0 | 598.7 ns | 8.52 ns | 7.55 ns | 1,670,237.5 | - | - |
| GetLong | .NET 9.0 | 609.5 ns | 4.75 ns | 3.97 ns | 1,640,560.2 | - | - |
| GetLong | .NET 10.0 | 566.7 ns | 11.28 ns | 17.22 ns | 1,764,690.0 | - | - |
Key observations
PutLongallocates 32,000 B per call (1,000 iterations × 32 B fromBitConverter.GetBytes) — this is the allocation waste the PR aims to eliminate.GetLongis already zero-alloc —BitConverter.ToInt64reads in-place, so the improvement there will be purely from removing theIPAddress.HostToNetworkOrderoverhead.- .NET 10.0 shows ~16% faster
PutLongvs .NET 8.0/9.0, likely from runtime improvements unrelated to this change.
These numbers can be compared against the "after" benchmarks once the BinaryPrimitives implementation is applied.
ByteBuffer Benchmark Results — After
|
| Method | Runtime | Mean | Op/s | Allocated |
|---|---|---|---|---|
| PutLong | .NET 8.0 | 849.9 ns | 1,176,608.9 | - |
| PutLong | .NET 9.0 | 742.2 ns | 1,347,352.5 | - |
| PutLong | .NET 10.0 | 750.6 ns | 1,332,190.8 | - |
| GetLong | .NET 8.0 | 666.9 ns | 1,499,526.8 | - |
| GetLong | .NET 9.0 | 666.3 ns | 1,500,761.2 | - |
| GetLong | .NET 10.0 | 677.7 ns | 1,475,523.6 | - |
Before vs After Comparison
| Method | Runtime | Before (Mean) | After (Mean) | Speedup | Before Alloc | After Alloc |
|---|---|---|---|---|---|---|
| PutLong | .NET 8.0 | 4,569.4 ns | 849.9 ns | 5.4× | 32,000 B | 0 B |
| PutLong | .NET 9.0 | 4,641.6 ns | 742.2 ns | 6.3× | 32,000 B | 0 B |
| PutLong | .NET 10.0 | 3,861.4 ns | 750.6 ns | 5.1× | 32,000 B | 0 B |
| GetLong | .NET 8.0 | 598.7 ns | 666.9 ns | 0.9× | - | - |
| GetLong | .NET 9.0 | 609.5 ns | 666.3 ns | 0.9× | - | - |
| GetLong | .NET 10.0 | 566.7 ns | 677.7 ns | 0.8× | - | - |
Summary
PutLong: 5.1–6.3× faster, allocations eliminated entirely (32,000 B → 0 B per 1,000 iterations)GetLong: ~10% regression —BinaryPrimitives.ReadInt64BigEndianwithAsSpanis slightly slower than the originalIPAddress.HostToNetworkOrder(BitConverter.ToInt64(...))path, which operated directly on the array without creating a span. The trade-off is consistency across all methods and removal of theSystem.Netdependency.
Baseline results captured from main (before this PR's implementation changes) on the same machine in the same session.
Full Benchmark Results — PR #155 (After
|
| Runtime | Before (Mean) | After (Mean) | Speedup | Before Alloc | After Alloc |
|---|---|---|---|---|---|
| .NET 8.0 | 4,569.4 ns | 849.9 ns | 5.4× | 32,000 B | 0 B |
| .NET 9.0 | 4,641.6 ns | 742.2 ns | 6.3× | 32,000 B | 0 B |
| .NET 10.0 | 3,861.4 ns | 750.6 ns | 5.1× | 32,000 B | 0 B |
GetLong (1,000 iterations)
| Runtime | Before (Mean) | After (Mean) | Delta | Before Alloc | After Alloc |
|---|---|---|---|---|---|
| .NET 8.0 | 598.7 ns | 666.9 ns | +11% | - | - |
| .NET 9.0 | 609.5 ns | 666.3 ns | +9% | - | - |
| .NET 10.0 | 566.7 ns | 677.7 ns | +20% | - | - |
LeadingZeroCount 64-Bit — No Regression
Compared against PR #157 baseline. Only CurrentImplementation shown (the method used in production).
| Runtime | Baseline (Mean) | This PR (Mean) | Delta |
|---|---|---|---|
| .NET 8.0 | 0.4212 ns | 0.4212 ns | 0% |
| .NET 9.0 | 0.4347 ns | 0.4347 ns | 0% |
| .NET 10.0 | 0.4199 ns | 0.4199 ns | 0% |
No change — expected, as this PR does not touch LeadingZeroCount.
LeadingZeroCount 32-Bit — No Regression
| Runtime | Baseline (Mean) | This PR (Mean) | Delta |
|---|---|---|---|
| .NET 8.0 | 0.4305 ns | 0.4305 ns | 0% |
| .NET 9.0 | 0.4381 ns | 0.4381 ns | 0% |
| .NET 10.0 | 0.4343 ns | 0.4343 ns | 0% |
Recording 32-Bit — No Regression
| Method | Runtime | Baseline (Mean) | This PR (Mean) | Delta |
|---|---|---|---|---|
| LongHistogramRecording | .NET 8.0 | 1.677 ns | 1.677 ns | 0% |
| LongHistogramRecording | .NET 9.0 | 1.528 ns | 1.528 ns | 0% |
| LongHistogramRecording | .NET 10.0 | 1.767 ns | 1.767 ns | 0% |
| IntHistogramRecording | .NET 8.0 | 1.402 ns | 1.402 ns | 0% |
| IntHistogramRecording | .NET 9.0 | 1.295 ns | 1.295 ns | 0% |
| IntHistogramRecording | .NET 10.0 | 1.329 ns | 1.329 ns | 0% |
| ShortHistogramRecording | .NET 8.0 | 1.493 ns | 1.493 ns | 0% |
| ShortHistogramRecording | .NET 9.0 | 1.223 ns | 1.223 ns | 0% |
| ShortHistogramRecording | .NET 10.0 | 1.203 ns | 1.203 ns | 0% |
Summary
PutLong: 5.1–6.3× faster, heap allocations completely eliminated (32 KB → 0 B per 1,000 calls)GetLong: 9–20% slower — theAsSpan+BinaryPrimitives.ReadInt64BigEndianpath has overhead vs the original directBitConverter.ToInt64array-index path. Both remain zero-alloc.- No regressions in LeadingZeroCount or Recording benchmarks — this PR's changes are isolated to
ByteBuffer.
Note: The LeadingZeroCount and Recording results are byte-identical to the PR #157 baseline because the benchmark report files were not regenerated (these benchmarks don't exercise any changed code). The ByteBuffer benchmark was freshly run on this branch.
Issue #141: ByteBuffer — Massive Allocation Waste on Hot Serialisation Path
Summary
ByteBuffer.csis the core serialisation primitive used throughout histogram encoding and decoding.Every call to
PutInt,PutLong,GetInt,GetLong,GetShort,PutDouble, andGetDoublecurrently incurs one or more heap allocations:PutIntandPutLongcallBitConverter.GetBytes(IPAddress.NetworkToHostOrder(value)), which allocates abyte[], then immediately discards it afterArray.Copy.GetInt,GetLong, andGetShortcallIPAddress.HostToNetworkOrder(BitConverter.ToInt32/64(...)), which also performs unnecessary work.PutDoublecallsBitConverter.GetBytesandArray.Reverse, allocating and mutating a temporary array.GetDoubledelegates to a privateToInt64 → CheckedFromBytes → FromByteschain that manually loops over bytes when a direct API call is available.The
IPAddresshost/network order functions exist solely for byte-order conversion; they are a networking API being misused as an endianness utility.System.Buffers.Binary.BinaryPrimitives(available sincenetstandard2.0via theSystem.Memorypackage) providesWriteInt64BigEndian,ReadInt64BigEndian, and equivalents for all required widths, writing directly into aSpan<byte>with zero allocation.On serialisation-heavy workloads (encoding thousands of histogram snapshots) this reduces GC pressure materially.
Affected Files
HdrHistogram/Utilities/ByteBuffer.csBinaryPrimitivesequivalentsHdrHistogram/HdrHistogram.csprojSystem.Memorypackage reference fornetstandard2.0target (if not already present)HdrHistogram.UnitTests/Utilities/ByteBufferTests.csPutInt/GetInt,PutLong/GetLong,PutDouble/GetDouble, and the positionedPutInt(index, value)overloadHdrHistogram.Benchmarking/ByteBufferBenchmarkclass to provide before/after evidenceRequired Code Changes
PutLong(line 267–272)GetLong(line 131–136)PutInt(int value)(line 241–246) andPutInt(int index, int value)(line 256–261)Replace
BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value))+Array.CopywithBinaryPrimitives.WriteInt32BigEndian.GetInt(line 120–125) andGetShort(line 109–114)Replace
IPAddress.HostToNetworkOrder(BitConverter.ToInt32/16(...))withBinaryPrimitives.ReadInt32BigEndian/ReadInt16BigEndian.PutDouble(line 278–285)GetDouble(line 142–147)Using
BitConverter.DoubleToInt64Bits/Int64BitsToDoublewithBinaryPrimitives.WriteInt64BigEndian/ReadInt64BigEndianis compatible withnetstandard2.0, avoiding the need forBinaryPrimitives.WriteDoubleBigEndianwhich requires .NET 5+.Once
GetDoubleis rewritten the following private helpers become dead code and should be deleted:Int64BitsToDouble(line 156–159)ToInt64(line 167–170)CheckedFromBytes(line 180–184)CheckByteArgument(line 196–208)FromBytes(line 218–226)The
using System.Net;import should also be removed onceIPAddressis no longer referenced.Acceptance Criteria
GetShort,GetInt,PutInt,PutInt(index,value),GetLong,PutLong,GetDouble,PutDouble) useBinaryPrimitiveswithAsSpan, performing zero intermediate heap allocations.IPAddress,IPAddress.HostToNetworkOrder, orIPAddress.NetworkToHostOrderremain inByteBuffer.cs.BitConverter.GetBytesorArray.Reverseremain inByteBuffer.cs.ToInt64,CheckedFromBytes,FromBytes,CheckByteArgument,Int64BitsToDouble) are removed.PutInt/GetInt,PutLong/GetLong,PutDouble/GetDouble, andPutInt(index, value).HdrHistogram.Benchmarking/demonstrating the allocation difference.net8.0,net9.0,net10.0,netstandard2.0.dotnet formatpasses with no warnings.Test Strategy
Unit tests to add (
ByteBufferTests.cs)Add a new test class
ByteBufferReadWriteTests(or extend the existing class) with:PutInt_and_GetInt_roundtrip— write a knownint, reset position, read it back, assert equality. Cover positive, negative, andint.MaxValue.PutInt_at_index_and_GetInt_roundtrip— write to a specific index without advancing position; read from that index; assert equality.PutLong_and_GetLong_roundtrip— same pattern forlong.PutDouble_and_GetDouble_roundtrip— same pattern fordouble. Includedouble.NaN,double.PositiveInfinity, and0.0.GetShort_returns_big_endian_value— write known bytes in big-endian order into the raw buffer, callGetShort, assert result.All tests should use xUnit
[Theory]with[InlineData]where multiple values are exercised.Existing tests
The single existing test (
ReadFrom_returns_all_bytes_when_stream_returns_partial_reads) must continue to pass unmodified; it exercises a different code path and is unaffected by this change.Integration / regression
The existing histogram encoding and decoding tests (round-trip encode/decode of
LongHistogramviaHistogramEncoderV2) exercise the full stack and serve as integration regression coverage. These should be confirmed passing.Benchmark
Add
HdrHistogram.Benchmarking/ByteBuffer/ByteBufferBenchmark.cswith:PutLong_Before/PutLong_Afterbenchmarks (or a single parameterised benchmark switching on implementation)GetLong_Before/GetLong_After[MemoryDiagnoser]to surface allocation countsThe issue requires before/after benchmark results to accompany the PR. Because the "before" code will be replaced, record baseline numbers from the original code prior to the change, and include them in the PR description.
Risks and Open Questions
netstandard2.0compatibility —BinaryPrimitivesis inSystem.Buffers.BinaryandAsSpan()on arrays requiresSystem.Memory. These are available innetstandard2.0via theSystem.MemoryNuGet package (version 4.5.x). Verify whetherHdrHistogram.csprojalready references this package; add it if not.BinaryPrimitives.WriteDoubleBigEndiannot available onnetstandard2.0— Mitigated by usingBinaryPrimitives.WriteInt64BigEndian(span, BitConverter.DoubleToInt64Bits(value))instead, which is available across all target frameworks.Byte-order correctness —
IPAddress.HostToNetworkOrderconverts from host byte order (typically little-endian on x86/x64) to big-endian, andBinaryPrimitives.WriteInt64BigEndianwrites in big-endian unconditionally. The replacement is semantically equivalent. This must be confirmed by the round-trip unit tests on a little-endian host.GetShortsemantics — The current implementation callsIPAddress.HostToNetworkOrderon a value read withBitConverter.ToInt16, which means it reads the buffer as little-endian and converts. The replacementBinaryPrimitives.ReadInt16BigEndianreads directly as big-endian, which is correct. Verify by tracing the callers ofGetShort(currently onlyHistogramDecodervariants).Memory<byte>/Span<byte>refactor — The issue mentions refactoringByteBufferto work overMemory<byte>orSpan<byte>to allow caller-supplied pooled memory. This is noted as a secondary suggestion. It is a larger architectural change and should be treated as a separate issue rather than included here, to keep this PR focused and reviewable.Task breakdown
Task List: Issue #141 — ByteBuffer Allocation Elimination
Cross-referenced against all acceptance criteria in
brief.md.1. Project Configuration
HdrHistogram/HdrHistogram.csproj— Add a conditional<PackageReference>forSystem.Memory(version4.5.*) scoped tonetstandard2.0only.The
BinaryPrimitivestype lives inSystem.Buffers.Binary, which ships inSystem.Memoryfornetstandard2.0;net8.0/net9.0/net10.0include it in-box.Verify:
dotnet restoresucceeds;dotnet buildsucceeds on all four target frameworks.2. Implementation Changes —
HdrHistogram/Utilities/ByteBuffer.csAdd
using System.Buffers.Binary;at the top of the file.Required before any
BinaryPrimitivescall compiles.Verify: File compiles without an unresolved-type error.
GetShort(line 109–114) — ReplaceIPAddress.HostToNetworkOrder(BitConverter.ToInt16(_internalBuffer, Position))withBinaryPrimitives.ReadInt16BigEndian(_internalBuffer.AsSpan(Position)).Reads the 16-bit big-endian value directly; no intermediate allocation.
Verify: No reference to
BitConverterorIPAddressremains in this method.GetInt(line 120–125) — ReplaceIPAddress.HostToNetworkOrder(BitConverter.ToInt32(_internalBuffer, Position))withBinaryPrimitives.ReadInt32BigEndian(_internalBuffer.AsSpan(Position)).Verify: No reference to
BitConverterorIPAddressremains in this method.GetLong(line 131–136) — ReplaceIPAddress.HostToNetworkOrder(BitConverter.ToInt64(_internalBuffer, Position))withBinaryPrimitives.ReadInt64BigEndian(_internalBuffer.AsSpan(Position)).Verify: No reference to
BitConverterorIPAddressremains in this method.GetDouble(line 142–147) — Replace theToInt64→CheckedFromBytes→FromBytescall chain with:Verify: Method body references neither
ToInt64nor any private helper; result is semantically equivalent.PutInt(int value)(line 241–246) — ReplaceBitConverter.GetBytes(IPAddress.NetworkToHostOrder(value))+Array.CopywithBinaryPrimitives.WriteInt32BigEndian(_internalBuffer.AsSpan(Position), value); Position += sizeof(int);.Verify: No
BitConverter.GetBytesorArray.Copycall remains in this method.PutInt(int index, int value)(line 256–261) — ReplaceBitConverter.GetBytes(IPAddress.NetworkToHostOrder(value))+Array.CopywithBinaryPrimitives.WriteInt32BigEndian(_internalBuffer.AsSpan(index), value);(position must NOT advance).Verify: Position is not modified; no
BitConverter.GetBytescall remains.PutLong(line 267–272) — ReplaceBitConverter.GetBytes(IPAddress.NetworkToHostOrder(value))+Array.CopywithBinaryPrimitives.WriteInt64BigEndian(_internalBuffer.AsSpan(Position), value); Position += sizeof(long);.Verify: No
BitConverter.GetBytesorArray.Copycall remains in this method.PutDouble(line 278–285) — ReplaceBitConverter.GetBytes+Array.Reversewith:Verify: No
Array.ReverseorBitConverter.GetBytescall remains in this method.3. Dead Code Removal —
HdrHistogram/Utilities/ByteBuffer.csThese five private helpers are unreachable once
GetDoubleis rewritten (acceptance criterion 4).Remove them in a single edit to keep the diff reviewable.
Int64BitsToDouble(line 156–159) — Thin wrapper; callers replaced.ToInt64(line 167–170) — Only called byCheckedFromBytes; now unused.CheckedFromBytes(line 180–184) — Only called byToInt64; now unused.CheckByteArgument(line 196–208) — Only called byCheckedFromBytes; now unused.FromBytes(line 218–226) — Only called byCheckedFromBytes; now unused.Verify:
dotnet buildreports zero compiler warnings about unreachable/unused code; noCS0219orCS8321warnings.4. Import Cleanup —
HdrHistogram/Utilities/ByteBuffer.csusing System.Net;—IPAddressis no longer referenced anywhere in the file after the implementation changes above.Verify: No
CS0246(type not found) orIDE0005(unnecessary using) warnings after removal;dotnet buildsucceeds.5. Unit Tests —
HdrHistogram.UnitTests/Utilities/ByteBufferTests.csAdd a new
ByteBufferReadWriteTestsclass (or extend the existingByteBufferTestsclass) using xUnit[Theory]/[InlineData].PutInt_and_GetInt_roundtrip— Write a knownintviaPutInt, resetPositionto 0, read viaGetInt, assert equality.Use
[InlineData]with at least: a positive value, a negative value, andint.MaxValue.Verify: All three inline cases pass; position advances by
sizeof(int)(4).PutInt_at_index_and_GetInt_roundtrip— CallPutInt(index, value)at a non-zero index; confirmPositiondid not change; read from the same index; assert equality.Use
[InlineData]with at least two different (index, value) pairs.Verify: Position is unchanged after the indexed write; read-back equals the written value.
PutLong_and_GetLong_roundtrip— Same pattern forlong.Use
[InlineData]with at least: a positive value, a negative value, andlong.MaxValue.Verify: All three inline cases pass; position advances by
sizeof(long)(8).PutDouble_and_GetDouble_roundtrip— Same pattern fordouble.Use
[InlineData]with at least:0.0,double.NaN,double.PositiveInfinity, and a normal finite value.Note:
double.NaNequality requiresBitConverter.DoubleToInt64Bitscomparison, not==.Verify: All inline cases pass; position advances by
sizeof(double)(8).GetShort_returns_big_endian_value— Allocate aByteBuffer, write two bytes in known big-endian order directly into the internal buffer (or viaBlockCopy), callGetShort, assert the expectedshortvalue.Use
[InlineData]with at least two known byte sequences.Verify: Result matches the expected big-endian interpretation.
Confirm existing test is unmodified and still passes —
ReadFrom_returns_all_bytes_when_stream_returns_partial_readsmust pass without any change to its body or thePartialReadStreamhelper.Verify: Test run output shows this test green.
6. Integration / Regression Confirmation
dotnet test HdrHistogram.UnitTests/) and confirm all histogram encoding/decoding tests pass unchanged.These tests exercise
HistogramEncoderV2, which calls every rewrittenByteBuffermethod, serving as integration regression coverage.Verify: Zero test failures; zero skipped tests introduced by this change.
7. Benchmarks —
HdrHistogram.Benchmarking/ByteBuffer/ByteBufferBenchmark.csCreate directory
HdrHistogram.Benchmarking/ByteBuffer/and addByteBufferBenchmark.cswith:[MemoryDiagnoser]attribute on the benchmark class.PutLong_Afterbenchmark — callsPutLongin a loop using the newBinaryPrimitivesimplementation.GetLong_Afterbenchmark — callsGetLongin a loop using the newBinaryPrimitivesimplementation.[GlobalSetup]so allocation inside setup is excluded from measurements.Verify:
dotnet build HdrHistogram.Benchmarking/succeeds; the class is discovered by BenchmarkDotNet when run with--list flat.Record baseline benchmark numbers from the original code before any changes and include them in the PR description as a before/after table.
(Because the "before" code will be deleted, run BenchmarkDotNet against the original branch first.)
Verify: PR description contains an
Allocatedcolumn comparison showing zero allocation in the "After" rows.Note: Baseline numbers must be captured from the original branch before merging and included in the PR description.
8. Format and Build Verification
dotnet format HdrHistogram/— Run after all implementation and dead-code-removal changes; fix any reported issues.Verify: Command exits with code 0 and reports no files changed (or all changes were intentional).
dotnet format HdrHistogram.UnitTests/— Run after adding new tests.Verify: Command exits with code 0.
dotnet format HdrHistogram.Benchmarking/— Run after adding the benchmark class.Verify: Command exits with code 0.
Multi-framework build check —
dotnet build HdrHistogram/ -f netstandard2.0, then repeat fornet8.0,net9.0,net10.0.Verify: Zero errors and zero warnings on all four target frameworks.
Acceptance Criteria Cross-Reference
BinaryPrimitiveswithAsSpan, zero intermediate allocationsIPAddress,HostToNetworkOrder, orNetworkToHostOrderinByteBuffer.csBitConverter.GetBytesorArray.ReverseinByteBuffer.csToInt64,CheckedFromBytes,FromBytes,CheckByteArgument,Int64BitsToDouble) removedPutInt/GetInt,PutLong/GetLong,PutDouble/GetDouble,PutInt(index,value)ByteBufferBenchmarkclass with[MemoryDiagnoser]inHdrHistogram.Benchmarking/net8.0,net9.0,net10.0,netstandard2.0dotnet formatpasses with no warningsCloses #141