Remix.run Logo
dzaima 7 hours ago

SIMD only helps you where you're arithmetic-limited; you may be limited by memory bandwidth, or perhaps float division if applicable; and if your scalar comparison got autovectorized you'd have roughly no benefit.

AVX-512 should be just fine via intrinsics/high-level vector types, not different from AVX2 in this regard.