Remix.run Logo
pkhuong 5 months ago

There's more to SIMD than BLAS. https://branchfree.org/2024/06/09/a-draft-taxonomy-of-simd-u... .

camel-cdr 5 months ago | parent [-]

BLAS, specifically gemm, is one of the rare things where you naturally need to specialize on vector register width.

Most problems don't require this: E.g. your basic penalizable math stuff, unicode conversion, base64 de/encode, json parsing, set intersection, quicksort*, bigint, run length encoding, chacha20, ...

And if you run into a problem that benefits from knowing the SIMD width, then just specialize on it. You can totally use variable-length SIMD ISA's in a fixed-length way when required. But most of the time it isn't required, and you have code that easily scales between vector lengths.

*quicksort: most time is spent partitioning, which is vector length agnostic, you can handle the leafs in a vector length agnostic way, but you'll get more efficient code if you specialize (idk how big the impact is, in vreg bitonic sort is quite efficient).