Skip to content

extract optimized parser from binary string into separate module#9

Merged
romsahel merged 1 commit intoparsers/rfc_2822/line-by-line-parsing-from-streamfrom
parsers/rfc_2822/binary-parsing
Mar 3, 2026
Merged

extract optimized parser from binary string into separate module#9
romsahel merged 1 commit intoparsers/rfc_2822/line-by-line-parsing-from-streamfrom
parsers/rfc_2822/binary-parsing

Conversation

@romsahel
Copy link
Copy Markdown
Owner

@romsahel romsahel commented Feb 19, 2026

See this PR for details: DockYard#204

Except this time I moved the refactor/optimization into its own module since the original repo does not seem to want to merge this change.
With this module side-by-side with the original, we can remain in sync and benefit from fixes made to the parser and renderer.

Here is a benchmark of this version:

##### With input large #####
Name                       ips        average  deviation         median         99th %
RFC2822Binary             2.56      390.00 ms     ±1.36%      389.94 ms      399.54 ms
RFC2822 (legacy)          1.42      704.02 ms     ±8.70%      682.41 ms      811.63 ms

Comparison:
RFC2822Binary             2.56
RFC2822 (legacy)          1.42 - 1.81x slower +314.01 ms

Memory usage statistics:

Name                     average  deviation         median         99th %
RFC2822Binary          547.98 MB     ±0.00%      547.98 MB      547.98 MB
RFC2822 (legacy)       290.44 MB     ±0.00%      290.44 MB      290.44 MB

Comparison:
RFC2822Binary          547.98 MB
RFC2822 (legacy)       290.44 MB - 0.53x memory usage -257.54533 MB

Seeing some more benchmark, I think there is still room for improvement though - especially with some of the string operations. From the fprof profile (on a 1.84M-line test email):

Regex.replace/4            41 calls   1925ms total
String.replace/3           15 calls   1924ms total
Regex.apply_list/5     384408 calls   1154ms own time
:re.loopexec/8         192248 calls    769ms own time

@romsahel romsahel force-pushed the parsers/rfc_2822/binary-parsing branch from b496165 to 1322470 Compare February 19, 2026 16:09
@romsahel romsahel force-pushed the parsers/rfc_2822/line-by-line-parsing-from-stream branch from 1896cd2 to c615d24 Compare February 20, 2026 13:11
@romsahel romsahel force-pushed the parsers/rfc_2822/binary-parsing branch from 1322470 to de640f2 Compare February 20, 2026 13:11
@romsahel romsahel force-pushed the parsers/rfc_2822/line-by-line-parsing-from-stream branch from c615d24 to e4e901d Compare February 23, 2026 12:50
@romsahel romsahel force-pushed the parsers/rfc_2822/binary-parsing branch from de640f2 to 73708e3 Compare February 23, 2026 12:50
@romsahel romsahel marked this pull request as ready for review February 23, 2026 13:18
@romsahel romsahel merged commit 8b66020 into parsers/rfc_2822/line-by-line-parsing-from-stream Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant