Remix clone Hacker News

new | show | ask | jobs Github

	▲	camel-cdr 3 months ago
		BLAS, specifically gemm, is one of the rare things where you naturally need to specialize on vector register width. Most problems don't require this: E.g. your basic penalizable math stuff, unicode conversion, base64 de/encode, json parsing, set intersection, quicksort, bigint, run length encoding, chacha20, ... And if you run into a problem that benefits from knowing the SIMD width, then just specialize on it. You can totally use variable-length SIMD ISA's in a fixed-length way when required. But most of the time it isn't required, and you have code that easily scales between vector lengths. quicksort: most time is spent partitioning, which is vector length agnostic, you can handle the leafs in a vector length agnostic way, but you'll get more efficient code if you specialize (idk how big the impact is, in vreg bitonic sort is quite efficient).