▲ | aqrit 5 days ago | |
`_mm_alignr_epi8` is a compile-time known shuffle that gets optimized well by LLVM [1]. If you need the exact behavior of `pshufb` you can use asm or the llvm intrinsic [2]. iirc, I once got the compiler to emit a `pshufb` for a runtime shuffle... that always guaranteed indices in the 0..15 range? Ironically, I also wanted to try zig by doing a StreamVByte implementation, but got derailed by the lack of SSE/AVX intrinsics support. [1] https://github.com/aqrit/sse2zig/blob/444ed8d129625ab5deec34... [2] https://github.com/aqrit/sse2zig/blob/444ed8d129625ab5deec34... | ||
▲ | lukaslalinsky 4 days ago | parent [-] | |
Oh, that's actually quite neat, it did not occur to me that you can use @shuffle with a compile time mask and it will optimize it to a specialized instruction. |