perf: Use std::to_chars and thread-local buffers for faster stateless meter send#134
perf: Use std::to_chars and thread-local buffers for faster stateless meter send#134jasonk000 wants to merge 3 commits intosn-proxyd-forkfrom
Conversation
… meter send Replace absl::StrFormat + trailing-zero erasure with std::to_chars, and, use thread_local strings to eliminate per-send allocations after warmup.
spectator/stateless_meters.h
Outdated
| publisher_->send(msg); | ||
| // std::to_chars with fixed format: no trailing zeros, no scientific notation, | ||
| // ~5-10x faster than absl::StrFormat("%s%f",...) + erase. | ||
| char num_buf[327]; // fixed-format double worst case: DBL_MAX ~309 digits |
There was a problem hiding this comment.
is this method accessed concurrently? if not, I wonder if having this buf embedded as a class member is better. Allocated once on a heap together with the class vs having this hundreds of bytes on stack for each function call
There was a problem hiding this comment.
oh sorry the PR title says "thread local".
There was a problem hiding this comment.
Yes there was a tradeoff/decision. The alternative path is to write directly to tl_msg as:
tl_msg.reserve(length_of_value_prefix + 327)
// then write to_chars directly into tl_msg
The reserve is required to guarantee sufficient capacity. However, the reserve() call always zeros the results so we end up writing 327 \0s to the buffer, which is not needed majority of the time. Easy enough to just put it on the stack and only pick up what we use.
I have a fix to put through as 327 is not enough in some cases.
|
Reminder: I moved from Observability to Storage last year. |
|
I am a little confused about the build workflow, since this looks like this branch would never work in CI. It works on my local proxyd build: |
|
@jasonk000 You are aware we are in progress of getting proxyd using version 2 of spectator-cpp? |
Yes @ecbadeaux, definitely aware of it! If this is thrown away / obsolete once we migrate, it's no issue since I already have it sitting here. On the other hand, if v2 takes longer than expected to land, then we can take advantage of this work. |
Replace absl::StrFormat + trailing-zero erasure with std::to_chars, and, use thread_local strings to eliminate per-send allocations after warmup. This shows up in the atlas destroy path specifically.
full chain before
full chain after