Clean up the codegen a bit (particularly x86)#116
Merged
valadaptive merged 6 commits intolinebender:mainfrom Nov 14, 2025
Merged
Clean up the codegen a bit (particularly x86)#116valadaptive merged 6 commits intolinebender:mainfrom
valadaptive merged 6 commits intolinebender:mainfrom
Conversation
30d5639 to
ee8fd07
Compare
It wasn't operating at the correct level of abstraction.
ee8fd07 to
3ab0b30
Compare
Contributor
Author
|
I've rebased this now that #115 has landed. I've made a couple more changes: in addition to removing the |
DJMcNab
approved these changes
Nov 14, 2025
Member
DJMcNab
left a comment
There was a problem hiding this comment.
Thanks!
I've only done a cursory review, but the changes can all be reasoned about locally, and nothing jumped out as a big change.
This PR is much easier to review with whitespace hidden.
github-merge-queue bot
pushed a commit
that referenced
this pull request
Nov 16, 2025
Stacked on top of #116, because it touches some of the codegen stuff I cleaned up in that PR. it's unfortunate that GitHub doesn't have stacked PRs. We have the `Bytes` trait, which lets us cast SIMD types to and from raw bytes (currently using `mem::transmute`). We can use its `bitcast` method instead of pulling in bytemuck for the "reinterpret" operations on `Fallback`. On the x86 side, we can use the `_mm_cast[...]` intrinsics. All the x86 integer types are `__m128i` or `__m256i`, so conversions between integer widths are no-ops. While working on this, I noticed that there are "reinterpret signed as unsigned" ops, but no corresponding "reinterpret unsigned as signed" ops. Are the reinterpret ops worth it at this point if we have the `Bytes` trait?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This builds on top of #115. There are no functional changes to the generated code (besides what #115 does), but cleans up the
fearless_simd_gencode:The
Archtrait has been removed. It operated at the wrong level of abstraction--it makes no sense to call e.g.mk_avx2::make_methodwith anyArchimplementation other thanX86.Many code generation functions in the AVX2 and SSE4.2 modules used to pass in the vector type along with its scalar and total bit widths. The former provides the latter, so we can stop passing all three in and just pass in the vector type.