Skip to content

Make out-of-place deinterleaving multi-threaded#100

Draft
Shnatsel wants to merge 4 commits intomainfrom
parallel-out-of-place-deinterleave
Draft

Make out-of-place deinterleaving multi-threaded#100
Shnatsel wants to merge 4 commits intomainfrom
parallel-out-of-place-deinterleave

Conversation

@Shnatsel
Copy link
Copy Markdown
Collaborator

@Shnatsel Shnatsel commented Mar 27, 2026

3x improvement on both Zen4 and M4 for size 27. Part of #99

On par of #95 on x86 and far ahead of it on M4.

TODO: heuristics on when to switch over to multi-threading to avoid harming single-threaded performance.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.79%. Comparing base (3fc78e0) to head (217c68e).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #100   +/-   ##
=======================================
  Coverage   99.79%   99.79%           
=======================================
  Files           8        8           
  Lines        1438     1438           
=======================================
  Hits         1435     1435           
  Misses          3        3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Shnatsel
Copy link
Copy Markdown
Collaborator Author

So, interestingly, collecting the output of interleaving (combine_re_im) into a Vec runs at 9GiB/s on Zen4 while writing to a slice only runs at 6GiB/s.

No difference on M4, that runs at 40GiB/s in both cases. That memory subsystem sure is something.

@Shnatsel
Copy link
Copy Markdown
Collaborator Author

The x86 drop may be due to cache coherency protocols, where the data needs to be read before it can be overwritten. The intrinsics to bypass this are unstable in std but there are prefetch and prefetch_index crates that might help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants