▲ | lukaslalinsky 5 days ago | |||||||
Just a couple of days ago, I wanted to implement specialized StreamVByte decoding in Zig, but @shuffle() needs to mask to be compile time known, while _mm_shuffle_epi8() works just fine with a dynamic mask. I remember that some time ago, I couldn't find an alternative to _mm_alignr_epi8(). | ||||||||
▲ | aqrit 5 days ago | parent [-] | |||||||
`_mm_alignr_epi8` is a compile-time known shuffle that gets optimized well by LLVM [1]. If you need the exact behavior of `pshufb` you can use asm or the llvm intrinsic [2]. iirc, I once got the compiler to emit a `pshufb` for a runtime shuffle... that always guaranteed indices in the 0..15 range? Ironically, I also wanted to try zig by doing a StreamVByte implementation, but got derailed by the lack of SSE/AVX intrinsics support. [1] https://github.com/aqrit/sse2zig/blob/444ed8d129625ab5deec34... [2] https://github.com/aqrit/sse2zig/blob/444ed8d129625ab5deec34... | ||||||||
|