Skip to content

feat(#141): ByteBuffer — Massive Allocation Waste on Hot Serialization Path#155

Open
leecampbell-codeagent wants to merge 5 commits intoHdrHistogram:mainfrom
leecampbell-codeagent:agent/141-bytebuffer-massive-allocation-waste-on-h
Open

feat(#141): ByteBuffer — Massive Allocation Waste on Hot Serialization Path#155
leecampbell-codeagent wants to merge 5 commits intoHdrHistogram:mainfrom
leecampbell-codeagent:agent/141-bytebuffer-massive-allocation-waste-on-h

Conversation

@leecampbell-codeagent
Copy link
Collaborator

Issue #141: ByteBuffer — Massive Allocation Waste on Hot Serialisation Path

Summary

ByteBuffer.cs is the core serialisation primitive used throughout histogram encoding and decoding.
Every call to PutInt, PutLong, GetInt, GetLong, GetShort, PutDouble, and GetDouble currently incurs one or more heap allocations:

  • PutInt and PutLong call BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value)), which allocates a byte[], then immediately discards it after Array.Copy.
  • GetInt, GetLong, and GetShort call IPAddress.HostToNetworkOrder(BitConverter.ToInt32/64(...)), which also performs unnecessary work.
  • PutDouble calls BitConverter.GetBytes and Array.Reverse, allocating and mutating a temporary array.
  • GetDouble delegates to a private ToInt64 → CheckedFromBytes → FromBytes chain that manually loops over bytes when a direct API call is available.

The IPAddress host/network order functions exist solely for byte-order conversion; they are a networking API being misused as an endianness utility.
System.Buffers.Binary.BinaryPrimitives (available since netstandard2.0 via the System.Memory package) provides WriteInt64BigEndian, ReadInt64BigEndian, and equivalents for all required widths, writing directly into a Span<byte> with zero allocation.

On serialisation-heavy workloads (encoding thousands of histogram snapshots) this reduces GC pressure materially.

Affected Files

File Change
HdrHistogram/Utilities/ByteBuffer.cs Replace allocation-heavy implementations with BinaryPrimitives equivalents
HdrHistogram/HdrHistogram.csproj Add System.Memory package reference for netstandard2.0 target (if not already present)
HdrHistogram.UnitTests/Utilities/ByteBufferTests.cs Add round-trip tests for PutInt/GetInt, PutLong/GetLong, PutDouble/GetDouble, and the positioned PutInt(index, value) overload
HdrHistogram.Benchmarking/ Add a new ByteBufferBenchmark class to provide before/after evidence

Required Code Changes

PutLong (line 267–272)

// Before
var longAsBytes = BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value));
Array.Copy(longAsBytes, 0, _internalBuffer, Position, longAsBytes.Length);
Position += longAsBytes.Length;

// After
BinaryPrimitives.WriteInt64BigEndian(_internalBuffer.AsSpan(Position), value);
Position += sizeof(long);

GetLong (line 131–136)

// Before
var longValue = IPAddress.HostToNetworkOrder(BitConverter.ToInt64(_internalBuffer, Position));
Position += sizeof(long);
return longValue;

// After
var longValue = BinaryPrimitives.ReadInt64BigEndian(_internalBuffer.AsSpan(Position));
Position += sizeof(long);
return longValue;

PutInt(int value) (line 241–246) and PutInt(int index, int value) (line 256–261)

Replace BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value)) + Array.Copy with BinaryPrimitives.WriteInt32BigEndian.

GetInt (line 120–125) and GetShort (line 109–114)

Replace IPAddress.HostToNetworkOrder(BitConverter.ToInt32/16(...)) with BinaryPrimitives.ReadInt32BigEndian / ReadInt16BigEndian.

PutDouble (line 278–285)

// After — no allocation, no Array.Reverse
BinaryPrimitives.WriteInt64BigEndian(_internalBuffer.AsSpan(Position), BitConverter.DoubleToInt64Bits(value));
Position += sizeof(double);

GetDouble (line 142–147)

// After — replaces ToInt64/CheckedFromBytes/FromBytes/CheckByteArgument chain
var longBits = BinaryPrimitives.ReadInt64BigEndian(_internalBuffer.AsSpan(Position));
Position += sizeof(double);
return BitConverter.Int64BitsToDouble(longBits);

Using BitConverter.DoubleToInt64Bits / Int64BitsToDouble with BinaryPrimitives.WriteInt64BigEndian / ReadInt64BigEndian is compatible with netstandard2.0, avoiding the need for BinaryPrimitives.WriteDoubleBigEndian which requires .NET 5+.

Once GetDouble is rewritten the following private helpers become dead code and should be deleted:

  • Int64BitsToDouble (line 156–159)
  • ToInt64 (line 167–170)
  • CheckedFromBytes (line 180–184)
  • CheckByteArgument (line 196–208)
  • FromBytes (line 218–226)

The using System.Net; import should also be removed once IPAddress is no longer referenced.

Acceptance Criteria

  1. All public read/write methods (GetShort, GetInt, PutInt, PutInt(index,value), GetLong, PutLong, GetDouble, PutDouble) use BinaryPrimitives with AsSpan, performing zero intermediate heap allocations.
  2. No references to IPAddress, IPAddress.HostToNetworkOrder, or IPAddress.NetworkToHostOrder remain in ByteBuffer.cs.
  3. No references to BitConverter.GetBytes or Array.Reverse remain in ByteBuffer.cs.
  4. The dead private helpers (ToInt64, CheckedFromBytes, FromBytes, CheckByteArgument, Int64BitsToDouble) are removed.
  5. All existing tests pass unchanged.
  6. New round-trip unit tests cover: PutInt/GetInt, PutLong/GetLong, PutDouble/GetDouble, and PutInt(index, value).
  7. A new benchmark class exists in HdrHistogram.Benchmarking/ demonstrating the allocation difference.
  8. The project builds and tests pass on all target frameworks: net8.0, net9.0, net10.0, netstandard2.0.
  9. dotnet format passes with no warnings.

Test Strategy

Unit tests to add (ByteBufferTests.cs)

Add a new test class ByteBufferReadWriteTests (or extend the existing class) with:

  • PutInt_and_GetInt_roundtrip — write a known int, reset position, read it back, assert equality. Cover positive, negative, and int.MaxValue.
  • PutInt_at_index_and_GetInt_roundtrip — write to a specific index without advancing position; read from that index; assert equality.
  • PutLong_and_GetLong_roundtrip — same pattern for long.
  • PutDouble_and_GetDouble_roundtrip — same pattern for double. Include double.NaN, double.PositiveInfinity, and 0.0.
  • GetShort_returns_big_endian_value — write known bytes in big-endian order into the raw buffer, call GetShort, assert result.

All tests should use xUnit [Theory] with [InlineData] where multiple values are exercised.

Existing tests

The single existing test (ReadFrom_returns_all_bytes_when_stream_returns_partial_reads) must continue to pass unmodified; it exercises a different code path and is unaffected by this change.

Integration / regression

The existing histogram encoding and decoding tests (round-trip encode/decode of LongHistogram via HistogramEncoderV2) exercise the full stack and serve as integration regression coverage. These should be confirmed passing.

Benchmark

Add HdrHistogram.Benchmarking/ByteBuffer/ByteBufferBenchmark.cs with:

  • PutLong_Before / PutLong_After benchmarks (or a single parameterised benchmark switching on implementation)
  • GetLong_Before / GetLong_After
  • Configured with [MemoryDiagnoser] to surface allocation counts

The issue requires before/after benchmark results to accompany the PR. Because the "before" code will be replaced, record baseline numbers from the original code prior to the change, and include them in the PR description.

Risks and Open Questions

  1. netstandard2.0 compatibilityBinaryPrimitives is in System.Buffers.Binary and AsSpan() on arrays requires System.Memory. These are available in netstandard2.0 via the System.Memory NuGet package (version 4.5.x). Verify whether HdrHistogram.csproj already references this package; add it if not.

  2. BinaryPrimitives.WriteDoubleBigEndian not available on netstandard2.0 — Mitigated by using BinaryPrimitives.WriteInt64BigEndian(span, BitConverter.DoubleToInt64Bits(value)) instead, which is available across all target frameworks.

  3. Byte-order correctnessIPAddress.HostToNetworkOrder converts from host byte order (typically little-endian on x86/x64) to big-endian, and BinaryPrimitives.WriteInt64BigEndian writes in big-endian unconditionally. The replacement is semantically equivalent. This must be confirmed by the round-trip unit tests on a little-endian host.

  4. GetShort semantics — The current implementation calls IPAddress.HostToNetworkOrder on a value read with BitConverter.ToInt16, which means it reads the buffer as little-endian and converts. The replacement BinaryPrimitives.ReadInt16BigEndian reads directly as big-endian, which is correct. Verify by tracing the callers of GetShort (currently only HistogramDecoder variants).

  5. Memory<byte> / Span<byte> refactor — The issue mentions refactoring ByteBuffer to work over Memory<byte> or Span<byte> to allow caller-supplied pooled memory. This is noted as a secondary suggestion. It is a larger architectural change and should be treated as a separate issue rather than included here, to keep this PR focused and reviewable.

Task breakdown

Task List: Issue #141 — ByteBuffer Allocation Elimination

Cross-referenced against all acceptance criteria in brief.md.


1. Project Configuration

  • HdrHistogram/HdrHistogram.csproj — Add a conditional <PackageReference> for System.Memory (version 4.5.*) scoped to netstandard2.0 only.
    The BinaryPrimitives type lives in System.Buffers.Binary, which ships in System.Memory for netstandard2.0; net8.0/net9.0/net10.0 include it in-box.
    Verify: dotnet restore succeeds; dotnet build succeeds on all four target frameworks.

2. Implementation Changes — HdrHistogram/Utilities/ByteBuffer.cs

  • Add using System.Buffers.Binary; at the top of the file.
    Required before any BinaryPrimitives call compiles.
    Verify: File compiles without an unresolved-type error.

  • GetShort (line 109–114) — Replace IPAddress.HostToNetworkOrder(BitConverter.ToInt16(_internalBuffer, Position)) with BinaryPrimitives.ReadInt16BigEndian(_internalBuffer.AsSpan(Position)).
    Reads the 16-bit big-endian value directly; no intermediate allocation.
    Verify: No reference to BitConverter or IPAddress remains in this method.

  • GetInt (line 120–125) — Replace IPAddress.HostToNetworkOrder(BitConverter.ToInt32(_internalBuffer, Position)) with BinaryPrimitives.ReadInt32BigEndian(_internalBuffer.AsSpan(Position)).
    Verify: No reference to BitConverter or IPAddress remains in this method.

  • GetLong (line 131–136) — Replace IPAddress.HostToNetworkOrder(BitConverter.ToInt64(_internalBuffer, Position)) with BinaryPrimitives.ReadInt64BigEndian(_internalBuffer.AsSpan(Position)).
    Verify: No reference to BitConverter or IPAddress remains in this method.

  • GetDouble (line 142–147) — Replace the ToInt64CheckedFromBytesFromBytes call chain with:

    var longBits = BinaryPrimitives.ReadInt64BigEndian(_internalBuffer.AsSpan(Position));
    Position += sizeof(double);
    return BitConverter.Int64BitsToDouble(longBits);

    Verify: Method body references neither ToInt64 nor any private helper; result is semantically equivalent.

  • PutInt(int value) (line 241–246) — Replace BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value)) + Array.Copy with BinaryPrimitives.WriteInt32BigEndian(_internalBuffer.AsSpan(Position), value); Position += sizeof(int);.
    Verify: No BitConverter.GetBytes or Array.Copy call remains in this method.

  • PutInt(int index, int value) (line 256–261) — Replace BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value)) + Array.Copy with BinaryPrimitives.WriteInt32BigEndian(_internalBuffer.AsSpan(index), value); (position must NOT advance).
    Verify: Position is not modified; no BitConverter.GetBytes call remains.

  • PutLong (line 267–272) — Replace BitConverter.GetBytes(IPAddress.NetworkToHostOrder(value)) + Array.Copy with BinaryPrimitives.WriteInt64BigEndian(_internalBuffer.AsSpan(Position), value); Position += sizeof(long);.
    Verify: No BitConverter.GetBytes or Array.Copy call remains in this method.

  • PutDouble (line 278–285) — Replace BitConverter.GetBytes + Array.Reverse with:

    BinaryPrimitives.WriteInt64BigEndian(_internalBuffer.AsSpan(Position), BitConverter.DoubleToInt64Bits(value));
    Position += sizeof(double);

    Verify: No Array.Reverse or BitConverter.GetBytes call remains in this method.


3. Dead Code Removal — HdrHistogram/Utilities/ByteBuffer.cs

These five private helpers are unreachable once GetDouble is rewritten (acceptance criterion 4).
Remove them in a single edit to keep the diff reviewable.

  • Delete Int64BitsToDouble (line 156–159) — Thin wrapper; callers replaced.
  • Delete ToInt64 (line 167–170) — Only called by CheckedFromBytes; now unused.
  • Delete CheckedFromBytes (line 180–184) — Only called by ToInt64; now unused.
  • Delete CheckByteArgument (line 196–208) — Only called by CheckedFromBytes; now unused.
  • Delete FromBytes (line 218–226) — Only called by CheckedFromBytes; now unused.
    Verify: dotnet build reports zero compiler warnings about unreachable/unused code; no CS0219 or CS8321 warnings.

4. Import Cleanup — HdrHistogram/Utilities/ByteBuffer.cs

  • Remove using System.Net;IPAddress is no longer referenced anywhere in the file after the implementation changes above.
    Verify: No CS0246 (type not found) or IDE0005 (unnecessary using) warnings after removal; dotnet build succeeds.

5. Unit Tests — HdrHistogram.UnitTests/Utilities/ByteBufferTests.cs

Add a new ByteBufferReadWriteTests class (or extend the existing ByteBufferTests class) using xUnit [Theory] / [InlineData].

  • PutInt_and_GetInt_roundtrip — Write a known int via PutInt, reset Position to 0, read via GetInt, assert equality.
    Use [InlineData] with at least: a positive value, a negative value, and int.MaxValue.
    Verify: All three inline cases pass; position advances by sizeof(int) (4).

  • PutInt_at_index_and_GetInt_roundtrip — Call PutInt(index, value) at a non-zero index; confirm Position did not change; read from the same index; assert equality.
    Use [InlineData] with at least two different (index, value) pairs.
    Verify: Position is unchanged after the indexed write; read-back equals the written value.

  • PutLong_and_GetLong_roundtrip — Same pattern for long.
    Use [InlineData] with at least: a positive value, a negative value, and long.MaxValue.
    Verify: All three inline cases pass; position advances by sizeof(long) (8).

  • PutDouble_and_GetDouble_roundtrip — Same pattern for double.
    Use [InlineData] with at least: 0.0, double.NaN, double.PositiveInfinity, and a normal finite value.
    Note: double.NaN equality requires BitConverter.DoubleToInt64Bits comparison, not ==.
    Verify: All inline cases pass; position advances by sizeof(double) (8).

  • GetShort_returns_big_endian_value — Allocate a ByteBuffer, write two bytes in known big-endian order directly into the internal buffer (or via BlockCopy), call GetShort, assert the expected short value.
    Use [InlineData] with at least two known byte sequences.
    Verify: Result matches the expected big-endian interpretation.

  • Confirm existing test is unmodified and still passesReadFrom_returns_all_bytes_when_stream_returns_partial_reads must pass without any change to its body or the PartialReadStream helper.
    Verify: Test run output shows this test green.


6. Integration / Regression Confirmation

  • Run the full unit test suite (dotnet test HdrHistogram.UnitTests/) and confirm all histogram encoding/decoding tests pass unchanged.
    These tests exercise HistogramEncoderV2, which calls every rewritten ByteBuffer method, serving as integration regression coverage.
    Verify: Zero test failures; zero skipped tests introduced by this change.

7. Benchmarks — HdrHistogram.Benchmarking/ByteBuffer/ByteBufferBenchmark.cs

  • Create directory HdrHistogram.Benchmarking/ByteBuffer/ and add ByteBufferBenchmark.cs with:

    • [MemoryDiagnoser] attribute on the benchmark class.
    • PutLong_After benchmark — calls PutLong in a loop using the new BinaryPrimitives implementation.
    • GetLong_After benchmark — calls GetLong in a loop using the new BinaryPrimitives implementation.
    • Buffer setup in [GlobalSetup] so allocation inside setup is excluded from measurements.
      Verify: dotnet build HdrHistogram.Benchmarking/ succeeds; the class is discovered by BenchmarkDotNet when run with --list flat.
  • Record baseline benchmark numbers from the original code before any changes and include them in the PR description as a before/after table.
    (Because the "before" code will be deleted, run BenchmarkDotNet against the original branch first.)
    Verify: PR description contains an Allocated column comparison showing zero allocation in the "After" rows.
    Note: Baseline numbers must be captured from the original branch before merging and included in the PR description.


8. Format and Build Verification

  • dotnet format HdrHistogram/ — Run after all implementation and dead-code-removal changes; fix any reported issues.
    Verify: Command exits with code 0 and reports no files changed (or all changes were intentional).

  • dotnet format HdrHistogram.UnitTests/ — Run after adding new tests.
    Verify: Command exits with code 0.

  • dotnet format HdrHistogram.Benchmarking/ — Run after adding the benchmark class.
    Verify: Command exits with code 0.

  • Multi-framework build checkdotnet build HdrHistogram/ -f netstandard2.0, then repeat for net8.0, net9.0, net10.0.
    Verify: Zero errors and zero warnings on all four target frameworks.


Acceptance Criteria Cross-Reference

Acceptance Criterion Covered By
1. All public read/write methods use BinaryPrimitives with AsSpan, zero intermediate allocations Tasks in §2
2. No references to IPAddress, HostToNetworkOrder, or NetworkToHostOrder in ByteBuffer.cs Tasks in §2 + §4
3. No references to BitConverter.GetBytes or Array.Reverse in ByteBuffer.cs Tasks in §2
4. Dead helpers (ToInt64, CheckedFromBytes, FromBytes, CheckByteArgument, Int64BitsToDouble) removed Tasks in §3
5. All existing tests pass unchanged Tasks in §5 (last item) + §6
6. New round-trip tests: PutInt/GetInt, PutLong/GetLong, PutDouble/GetDouble, PutInt(index,value) Tasks in §5
7. New ByteBufferBenchmark class with [MemoryDiagnoser] in HdrHistogram.Benchmarking/ Tasks in §7
8. Builds and tests pass on net8.0, net9.0, net10.0, netstandard2.0 §1 + §8
9. dotnet format passes with no warnings Tasks in §8

Closes #141

@LeeCampbell LeeCampbell force-pushed the agent/141-bytebuffer-massive-allocation-waste-on-h branch from 55e38be to 6fca7bf Compare March 20, 2026 06:20
@LeeCampbell
Copy link
Collaborator

ByteBuffer Baseline Benchmark Results (Before BinaryPrimitives Changes)

These results were captured against the original ByteBuffer implementation (using IPAddress.NetworkToHostOrder + BitConverter.GetBytes + Array.Copy), to serve as the "before" baseline for this PR.

Environment: BenchmarkDotNet v0.15.8, Windows 11, Intel Core i5-14400 2.50GHz, 16 logical / 10 physical cores
.NET SDK: 10.0.104

Method Runtime Mean Error StdDev Op/s Gen0 Allocated
PutLong .NET 8.0 4,569.4 ns 89.39 ns 131.03 ns 218,848.9 3.0594 32000 B
PutLong .NET 9.0 4,641.6 ns 46.39 ns 41.12 ns 215,443.0 3.0594 32000 B
PutLong .NET 10.0 3,861.4 ns 71.81 ns 59.96 ns 258,970.4 3.0594 32000 B
GetLong .NET 8.0 598.7 ns 8.52 ns 7.55 ns 1,670,237.5 - -
GetLong .NET 9.0 609.5 ns 4.75 ns 3.97 ns 1,640,560.2 - -
GetLong .NET 10.0 566.7 ns 11.28 ns 17.22 ns 1,764,690.0 - -

Key observations

  • PutLong allocates 32,000 B per call (1,000 iterations × 32 B from BitConverter.GetBytes) — this is the allocation waste the PR aims to eliminate.
  • GetLong is already zero-allocBitConverter.ToInt64 reads in-place, so the improvement there will be purely from removing the IPAddress.HostToNetworkOrder overhead.
  • .NET 10.0 shows ~16% faster PutLong vs .NET 8.0/9.0, likely from runtime improvements unrelated to this change.

These numbers can be compared against the "after" benchmarks once the BinaryPrimitives implementation is applied.

@LeeCampbell
Copy link
Collaborator

ByteBuffer Benchmark Results — After BinaryPrimitives Changes

Environment: BenchmarkDotNet v0.15.8, Windows 11, Intel Core i5-14400 2.50GHz, 16 logical / 10 physical cores
.NET SDK: 10.0.104

After (this PR)

Method Runtime Mean Op/s Allocated
PutLong .NET 8.0 849.9 ns 1,176,608.9 -
PutLong .NET 9.0 742.2 ns 1,347,352.5 -
PutLong .NET 10.0 750.6 ns 1,332,190.8 -
GetLong .NET 8.0 666.9 ns 1,499,526.8 -
GetLong .NET 9.0 666.3 ns 1,500,761.2 -
GetLong .NET 10.0 677.7 ns 1,475,523.6 -

Before vs After Comparison

Method Runtime Before (Mean) After (Mean) Speedup Before Alloc After Alloc
PutLong .NET 8.0 4,569.4 ns 849.9 ns 5.4× 32,000 B 0 B
PutLong .NET 9.0 4,641.6 ns 742.2 ns 6.3× 32,000 B 0 B
PutLong .NET 10.0 3,861.4 ns 750.6 ns 5.1× 32,000 B 0 B
GetLong .NET 8.0 598.7 ns 666.9 ns 0.9× - -
GetLong .NET 9.0 609.5 ns 666.3 ns 0.9× - -
GetLong .NET 10.0 566.7 ns 677.7 ns 0.8× - -

Summary

  • PutLong: 5.1–6.3× faster, allocations eliminated entirely (32,000 B → 0 B per 1,000 iterations)
  • GetLong: ~10% regressionBinaryPrimitives.ReadInt64BigEndian with AsSpan is slightly slower than the original IPAddress.HostToNetworkOrder(BitConverter.ToInt64(...)) path, which operated directly on the array without creating a span. The trade-off is consistency across all methods and removal of the System.Net dependency.

Baseline results captured from main (before this PR's implementation changes) on the same machine in the same session.

@LeeCampbell
Copy link
Collaborator

Full Benchmark Results — PR #155 (After BinaryPrimitives Changes)

Environment: BenchmarkDotNet v0.15.8, Windows 11, Intel Core i5-14400 2.50GHz, 16 logical / 10 physical cores
.NET SDK: 10.0.104 — Runtimes: .NET 8.0.25, .NET 9.0.14, .NET 10.0.4


ByteBuffer — Before vs After

The primary target of this PR. Baseline captured from main (PR #157 comment).

PutLong (1,000 iterations)

Runtime Before (Mean) After (Mean) Speedup Before Alloc After Alloc
.NET 8.0 4,569.4 ns 849.9 ns 5.4× 32,000 B 0 B
.NET 9.0 4,641.6 ns 742.2 ns 6.3× 32,000 B 0 B
.NET 10.0 3,861.4 ns 750.6 ns 5.1× 32,000 B 0 B

GetLong (1,000 iterations)

Runtime Before (Mean) After (Mean) Delta Before Alloc After Alloc
.NET 8.0 598.7 ns 666.9 ns +11% - -
.NET 9.0 609.5 ns 666.3 ns +9% - -
.NET 10.0 566.7 ns 677.7 ns +20% - -

LeadingZeroCount 64-Bit — No Regression

Compared against PR #157 baseline. Only CurrentImplementation shown (the method used in production).

Runtime Baseline (Mean) This PR (Mean) Delta
.NET 8.0 0.4212 ns 0.4212 ns 0%
.NET 9.0 0.4347 ns 0.4347 ns 0%
.NET 10.0 0.4199 ns 0.4199 ns 0%

No change — expected, as this PR does not touch LeadingZeroCount.

LeadingZeroCount 32-Bit — No Regression

Runtime Baseline (Mean) This PR (Mean) Delta
.NET 8.0 0.4305 ns 0.4305 ns 0%
.NET 9.0 0.4381 ns 0.4381 ns 0%
.NET 10.0 0.4343 ns 0.4343 ns 0%

Recording 32-Bit — No Regression

Method Runtime Baseline (Mean) This PR (Mean) Delta
LongHistogramRecording .NET 8.0 1.677 ns 1.677 ns 0%
LongHistogramRecording .NET 9.0 1.528 ns 1.528 ns 0%
LongHistogramRecording .NET 10.0 1.767 ns 1.767 ns 0%
IntHistogramRecording .NET 8.0 1.402 ns 1.402 ns 0%
IntHistogramRecording .NET 9.0 1.295 ns 1.295 ns 0%
IntHistogramRecording .NET 10.0 1.329 ns 1.329 ns 0%
ShortHistogramRecording .NET 8.0 1.493 ns 1.493 ns 0%
ShortHistogramRecording .NET 9.0 1.223 ns 1.223 ns 0%
ShortHistogramRecording .NET 10.0 1.203 ns 1.203 ns 0%

Summary

  • PutLong: 5.1–6.3× faster, heap allocations completely eliminated (32 KB → 0 B per 1,000 calls)
  • GetLong: 9–20% slower — the AsSpan + BinaryPrimitives.ReadInt64BigEndian path has overhead vs the original direct BitConverter.ToInt64 array-index path. Both remain zero-alloc.
  • No regressions in LeadingZeroCount or Recording benchmarks — this PR's changes are isolated to ByteBuffer.

Note: The LeadingZeroCount and Recording results are byte-identical to the PR #157 baseline because the benchmark report files were not regenerated (these benchmarks don't exercise any changed code). The ByteBuffer benchmark was freshly run on this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ByteBuffer — Massive Allocation Waste on Hot Serialization Path

2 participants