Clean up the codegen a bit (particularly x86) by valadaptive · Pull Request #116 · linebender/fearless_simd

valadaptive · 2025-11-12T07:02:54Z

This builds on top of #115. There are no functional changes to the generated code (besides what #115 does), but cleans up the fearless_simd_gen code:

The Arch trait has been removed. It operated at the wrong level of abstraction--it makes no sense to call e.g. mk_avx2::make_method with any Arch implementation other than X86.
Many code generation functions in the AVX2 and SSE4.2 modules used to pass in the vector type along with its scalar and total bit widths. The former provides the latter, so we can stop passing all three in and just pass in the vector type.

It wasn't operating at the correct level of abstraction.

valadaptive · 2025-11-14T00:54:22Z

I've rebased this now that #115 has landed. I've made a couple more changes: in addition to removing the Arch trait, I've now removed the marker structs that used to implement it. I've also made a few more functions pub(crate), fixing most (but not all) unreachable_pub lints.

DJMcNab

Thanks!

I've only done a cursory review, but the changes can all be reasoned about locally, and nothing jumped out as a big change.
This PR is much easier to review with whitespace hidden.

Stacked on top of #116, because it touches some of the codegen stuff I cleaned up in that PR. it's unfortunate that GitHub doesn't have stacked PRs. We have the `Bytes` trait, which lets us cast SIMD types to and from raw bytes (currently using `mem::transmute`). We can use its `bitcast` method instead of pulling in bytemuck for the "reinterpret" operations on `Fallback`. On the x86 side, we can use the `_mm_cast[...]` intrinsics. All the x86 integer types are `__m128i` or `__m256i`, so conversions between integer widths are no-ops. While working on this, I noticed that there are "reinterpret signed as unsigned" ops, but no corresponding "reinterpret unsigned as signed" ops. Are the reinterpret ops worth it at this point if we have the `Bytes` trait?

valadaptive requested review from DJMcNab and LaurenzV November 12, 2025 07:02

valadaptive force-pushed the x86-cleanups branch from 30d5639 to ee8fd07 Compare November 12, 2025 19:21

valadaptive mentioned this pull request Nov 12, 2025

Implement the reinterpret operations without bytemuck #120

Merged

valadaptive added 6 commits November 13, 2025 19:34

Remove the Arch trait

cce86c9

It wasn't operating at the correct level of abstraction.

Remove redundant arguments from x86 codegen funcs

7e32a6b

Take &VecType in more codegen methods

4f614c2

Consistently take VecType by reference

400938e

Remove the Arch trait even more

a4b936c

Fix some unreachable_pub lints

3ab0b30

valadaptive force-pushed the x86-cleanups branch from ee8fd07 to 3ab0b30 Compare November 14, 2025 00:50

DJMcNab approved these changes Nov 14, 2025

View reviewed changes

valadaptive added this pull request to the merge queue Nov 14, 2025

Merged via the queue into linebender:main with commit 9039b44 Nov 14, 2025
18 checks passed

valadaptive deleted the x86-cleanups branch November 14, 2025 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up the codegen a bit (particularly x86)#116

Clean up the codegen a bit (particularly x86)#116
valadaptive merged 6 commits intolinebender:mainfrom
valadaptive:x86-cleanups

valadaptive commented Nov 12, 2025

Uh oh!

valadaptive commented Nov 14, 2025

Uh oh!

DJMcNab left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

valadaptive commented Nov 12, 2025

Uh oh!

valadaptive commented Nov 14, 2025

Uh oh!

DJMcNab left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants