Remix clone Hacker News

new | show | ask | jobs Github

	▲	aqrit 5 days ago
		`_mm_alignr_epi8` is a compile-time known shuffle that gets optimized well by LLVM [1]. If you need the exact behavior of `pshufb` you can use asm or the llvm intrinsic [2]. iirc, I once got the compiler to emit a `pshufb` for a runtime shuffle... that always guaranteed indices in the 0..15 range? Ironically, I also wanted to try zig by doing a StreamVByte implementation, but got derailed by the lack of SSE/AVX intrinsics support. [1] https://github.com/aqrit/sse2zig/blob/444ed8d129625ab5deec34... [2] https://github.com/aqrit/sse2zig/blob/444ed8d129625ab5deec34...
	▲	lukaslalinsky 4 days ago \| parent [-]
		Oh, that's actually quite neat, it did not occur to me that you can use @shuffle with a compile time mask and it will optimize it to a specialized instruction.