▲ | the__alchemist 5 months ago | |
Noob question! What about AVX-512 makes it unique to assembly programmers? I'm just dipping my toes in, and have been doing some chemistry computations using f32x8, Vec3x8 etc (AVX-256). I have good workflows set up, but have only been getting 2x speedup over non-SIMD. (Was hoping for closer to 8). I figured AVX-512 would allow f32x16 etc, which would be mostly a drop-in. (I have macros to set up the types, and you input num lanes). | ||
▲ | ack_complete 5 months ago | parent | next [-] | |
AVX-512 has a lot of instructions that just extend vectorization to 512-bit and make it nicer with features like masking. Thus, a very valid use of it is just to double vectorization width. But it also has a bunch of specialized instructions that can boost performance beyond just the 2x width. One of them is VPCOMPRESSB, which accelerates compact encoding of sparse data. Another is GF2P8AFFINEQB, which is targeted at specific encryption algorithms but can also be abused for general bit shuffling. Algorithms like computing a histogram can benefit significantly, but it requires reshaping the algorithm around very particular and peculiar intermediate data layouts that are beyond the transformations a compiler can do. This doesn't literally require assembly language, though, it can often be done with intrinsics. | ||
▲ | dzaima 5 months ago | parent | prev [-] | |
SIMD only helps you where you're arithmetic-limited; you may be limited by memory bandwidth, or perhaps float division if applicable; and if your scalar comparison got autovectorized you'd have roughly no benefit. AVX-512 should be just fine via intrinsics/high-level vector types, not different from AVX2 in this regard. |