Make out-of-place deinterleaving multi-threaded by Shnatsel · Pull Request #100 · QuState/PhastFT

Shnatsel · 2026-03-27T19:53:51Z

3x improvement on both Zen4 and M4 for size 27. Part of #99

On par of #95 on x86 and far ahead of it on M4.

TODO: heuristics on when to switch over to multi-threading to avoid harming single-threaded performance.

codecov-commenter · 2026-03-27T19:56:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.79%. Comparing base (3fc78e0) to head (217c68e).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #100   +/-   ##
=======================================
  Coverage   99.79%   99.79%           
=======================================
  Files           8        8           
  Lines        1438     1438           
=======================================
  Hits         1435     1435           
  Misses          3        3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…as well

…lize it as well" This reverts commit 6b48b49.

…g to an existing buffer to see how that performs

Shnatsel · 2026-03-27T20:38:46Z

So, interestingly, collecting the output of interleaving (combine_re_im) into a Vec runs at 9GiB/s on Zen4 while writing to a slice only runs at 6GiB/s.

No difference on M4, that runs at 40GiB/s in both cases. That memory subsystem sure is something.

Shnatsel · 2026-03-27T21:21:18Z

The x86 drop may be due to cache coherency protocols, where the data needs to be read before it can be overwritten. The intrinsics to bypass this are unstable in std but there are prefetch and prefetch_index crates that might help.

Make out-of-place deinterleaving multi-threaded

b53702a

Shnatsel added 3 commits March 27, 2026 20:10

Refactor deinterleaving to accept an output slice and parallelize it …

6b48b49

…as well

Revert "Refactor deinterleaving to accept an output slice and paralle…

93f8c2a

…lize it as well" This reverts commit 6b48b49.

Parallelize deinterleaving by collecting into a vec instead of writin…

217c68e

…g to an existing buffer to see how that performs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make out-of-place deinterleaving multi-threaded#100

Make out-of-place deinterleaving multi-threaded#100
Shnatsel wants to merge 4 commits intomainfrom
parallel-out-of-place-deinterleave

Shnatsel commented Mar 27, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 27, 2026 •

edited

Loading

Uh oh!

Shnatsel commented Mar 27, 2026

Uh oh!

Shnatsel commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Shnatsel commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Shnatsel commented Mar 27, 2026

Uh oh!

Shnatsel commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Shnatsel commented Mar 27, 2026 •

edited

Loading

codecov-commenter commented Mar 27, 2026 •

edited

Loading