Skip to content

perf: Use std::to_chars and thread-local buffers for faster stateless meter send#134

Open
jasonk000 wants to merge 3 commits intosn-proxyd-forkfrom
jkoch/spectator-fast-double
Open

perf: Use std::to_chars and thread-local buffers for faster stateless meter send#134
jasonk000 wants to merge 3 commits intosn-proxyd-forkfrom
jkoch/spectator-fast-double

Conversation

@jasonk000
Copy link
Copy Markdown
Member

Replace absl::StrFormat + trailing-zero erasure with std::to_chars, and, use thread_local strings to eliminate per-send allocations after warmup. This shows up in the atlas destroy path specifically.

full chain before

Concurrent/FilterChain/Accept/0/threads:1_mean         10532 ns        10532 ns           20
    atlas_create_ns=135.793 atlas_decode_ns=1.41467k atlas_destroy_ns=4.28813k atlas_encode_ns=581.29
Concurrent/FilterChain/Accept/0/threads:24_mean        12890 ns        12834 ns           20
    atlas_create_ns=153.269 atlas_decode_ns=1.80219k atlas_destroy_ns=5.17023k atlas_encode_ns=840.605
Concurrent/FilterChain/Shed/1/threads:1_mean            7981 ns         7981 ns           20
    atlas_create_ns=137.023 atlas_decode_ns=1.40453k atlas_destroy_ns=2.91061k atlas_encode_ns=0
Concurrent/FilterChain/Shed/1/threads:24_mean           9444 ns         9424 ns           20
    atlas_create_ns=147.787 atlas_decode_ns=1.75386k atlas_destroy_ns=3.47667k atlas_encode_ns=0

full chain after

Concurrent/FilterChain/Accept/0/threads:1_mean         10172 ns        10171 ns           20
    atlas_create_ns=138.092 atlas_decode_ns=1.41502k atlas_destroy_ns=3.88331k atlas_encode_ns=575.575
Concurrent/FilterChain/Accept/0/threads:24_mean        12288 ns        12258 ns           20
    atlas_create_ns=151.505 atlas_decode_ns=1.76232k atlas_destroy_ns=4.65918k atlas_encode_ns=820.659
Concurrent/FilterChain/Shed/1/threads:1_mean            7897 ns         7897 ns           20
    atlas_create_ns=138.271 atlas_decode_ns=1.39419k atlas_destroy_ns=2.82182k atlas_encode_ns=0
Concurrent/FilterChain/Shed/1/threads:24_mean           9088 ns         9044 ns           20
    atlas_create_ns=146.748 atlas_decode_ns=1.73368k atlas_destroy_ns=3.13205k atlas_encode_ns=0

… meter send

Replace absl::StrFormat + trailing-zero erasure with std::to_chars, and,
use thread_local strings to eliminate per-send allocations after warmup.
publisher_->send(msg);
// std::to_chars with fixed format: no trailing zeros, no scientific notation,
// ~5-10x faster than absl::StrFormat("%s%f",...) + erase.
char num_buf[327]; // fixed-format double worst case: DBL_MAX ~309 digits
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this method accessed concurrently? if not, I wonder if having this buf embedded as a class member is better. Allocated once on a heap together with the class vs having this hundreds of bytes on stack for each function call

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry the PR title says "thread local".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there was a tradeoff/decision. The alternative path is to write directly to tl_msg as:

tl_msg.reserve(length_of_value_prefix + 327)
// then write to_chars directly into tl_msg

The reserve is required to guarantee sufficient capacity. However, the reserve() call always zeros the results so we end up writing 327 \0s to the buffer, which is not needed majority of the time. Easy enough to just put it on the stack and only pick up what we use.

I have a fix to put through as 327 is not enough in some cases.

@copperlight copperlight requested review from ecbadeaux and removed request for copperlight March 26, 2026 22:08
@copperlight
Copy link
Copy Markdown
Collaborator

Reminder: I moved from Observability to Storage last year.

@jasonk000
Copy link
Copy Markdown
Member Author

jasonk000 commented Mar 26, 2026

I am a little confused about the build workflow, since this looks like this branch would never work in CI. It works on my local proxyd build:

proxyd jkoch ~/cpie-proxyd $ bazel test @spectator//:spectator_test --override_repository=spectator=/home/coder/spectator-cpp
WARNING: The following configs were expanded more than once: [clang, clang-common, libc++]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Invocation ID: 43c84703-f680-4936-8948-3a6435efe8de
INFO: Streaming build results to: https://netflix.buildbuddy.io/invocation/43c84703-f680-4936-8948-3a6435efe8de
WARNING: The following configs were expanded more than once: [clang, clang-common, libc++]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
WARNING: Build options --//proxyd/netflix/common:testing and --compilation_mode have changed, discarding analysis cache (this can be expensive, see https://bazel.build/advanced/performance/iteration-speed).
INFO: Analyzed target @@spectator//:spectator_test (45 packages loaded, 9391 targets configured).
INFO: Found 1 test target...
Target @@spectator//:spectator_test up-to-date:
  bazel-bin/external/spectator/spectator_test
INFO: Elapsed time: 1.443s, Critical Path: 0.09s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action

Executed 0 out of 1 test: 1 test passes.
INFO: Streaming build results to: https://netflix.buildbuddy.io/invocation/43c84703-f680-4936-8948-3a6435efe8de

@ecbadeaux
Copy link
Copy Markdown
Contributor

@jasonk000 You are aware we are in progress of getting proxyd using version 2 of spectator-cpp?

@jasonk000
Copy link
Copy Markdown
Member Author

@jasonk000 You are aware we are in progress of getting proxyd using version 2 of spectator-cpp?

Yes @ecbadeaux, definitely aware of it! If this is thrown away / obsolete once we migrate, it's no issue since I already have it sitting here. On the other hand, if v2 takes longer than expected to land, then we can take advantage of this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants