-
Notifications
You must be signed in to change notification settings - Fork 3
Stores into fixed-size slices are easy to misuse, lead to subtle bugs #33
Description
I've just run into a very subtle bug with SIMD stores in my own code that may also apply to this crate.
I was trying to make these two lines safe:
let out_data = core::mem::transmute::<*mut i16, *mut __m256i>(data.as_mut_ptr());
_mm256_storeu_si256(out_data, ymm3);so I made a helper function to wrap it:
fn avx_store(input: __m256i, output: &mut [i16]) {
unsafe { _mm256_storeu_si256(output.as_mut_ptr() as *mut __m256i, input) }
}and the original code became
avx_store(ymm3, &mut data[0..16].try_into().unwrap());And everything broke. I checked and double-checked and triple-checked and started wondering about a compiler bug because everything was so trivial and obviously correct.
Only with outside help I realized that my conversion to a fixed-size slice, &mut data[0..16].try_into().unwrap(), was creating an intermediate array instead of giving me a reference to the original slice. Then the SIMD store would write into that intermediate array and the data would never make it into the output.
Here's the full code if you'd like more context: vstroebel/jpeg-encoder#18
It seems that the function signatures in but it could still be a problem on ARM, e.g. https://docs.rs/safe_unaligned_simd/latest/aarch64-apple-darwin/safe_unaligned_simd/aarch64/fn.vst1_f32.htmlsafe_unaligned_simd sidestep this problem for x86 by accepting dynamically sized arrays for store intrinsics,