Skip to content

Optimize MessageBus::Implementation#decode_channel_name method#384

Open
moberegger wants to merge 2 commits intodiscourse:mainfrom
moberegger:moberegger/optimize-decode_channel_name
Open

Optimize MessageBus::Implementation#decode_channel_name method#384
moberegger wants to merge 2 commits intodiscourse:mainfrom
moberegger:moberegger/optimize-decode_channel_name

Conversation

@moberegger
Copy link
Copy Markdown
Contributor

@moberegger moberegger commented Apr 1, 2026

Another hotspot showing up in our production profiles.

This PR replaces the channel.split(ENCODE_SITE_TOKEN) with a two-branch approach. It first uses byteindex (to skip character-encoding boundary checks) to check for the "$|$" site token.

When no site token is found:

  • Skip the split entirely and simply return [channel, nil]. This eliminates both the array and substring allocations that split would produce.

When a site token is found:

  • Use the byteindex token position, use byteslice to extract the channel and site_id directly. This is faster than split because it scans the string only once and I believe also skips character-encoding boundary checks. byteslice preserves the source string's encoding, so non-ASCII channel names should be handled correctly.

Some benchmarks against decode_channel_name.

Without a site ID

ruby 4.0.2 (2026-03-17 revision d3da9fec82) +YJIT +PRISM [arm64-darwin25]

Channel: "/global/test"

=== Iterations per second (global channel, no site_id) ===
Warming up --------------------------------------
            original   871.044k i/100ms
           optimized     1.407M i/100ms
Calculating -------------------------------------
            original     10.053M (± 2.2%) i/s   (99.48 ns/i) -     50.521M in   5.028068s
           optimized     18.492M (± 3.2%) i/s   (54.08 ns/i) -     92.844M in   5.026627s

Comparison:
            original: 10052561.9 i/s
           optimized: 18491743.0 i/s - 1.84x  faster

=== Memory allocation (global channel, no site_id) ===
Calculating -------------------------------------
            original    80.000  memsize (     0.000  retained)
                         2.000  objects (     0.000  retained)
                         1.000  strings (     0.000  retained)
           optimized    40.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
           optimized:         40 allocated
            original:         80 allocated - 2.00x more

With a site ID

ruby 4.0.2 (2026-03-17 revision d3da9fec82) +YJIT +PRISM [arm64-darwin25]

Channel: "/test/channel$|$default"

=== Iterations per second (channel with site_id) ===
Warming up --------------------------------------
            original   569.231k i/100ms
           optimized   677.742k i/100ms
Calculating -------------------------------------
            original      6.503M (± 2.4%) i/s  (153.77 ns/i) -     33.015M in   5.079579s
           optimized      7.503M (± 2.4%) i/s  (133.27 ns/i) -     37.954M in   5.061283s

Comparison:
            original:  6503391.7 i/s
           optimized:  7503464.3 i/s - 1.15x  faster

=== Memory allocation (channel with site_id) ===
Calculating -------------------------------------
            original   120.000  memsize (     0.000  retained)
                         3.000  objects (     0.000  retained)
                         2.000  strings (     0.000  retained)
           optimized   120.000  memsize (     0.000  retained)
                         3.000  objects (     0.000  retained)
                         2.000  strings (     0.000  retained)

Comparison:
            original:        120 allocated
           optimized:        120 allocated - same

@moberegger moberegger marked this pull request as ready for review April 2, 2026 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant