▲ | jandrewrogers 3 days ago | |
The variable length vectors are probably one of those ideas that sound good on paper but don’t work that well in practice. The issue is that you actually do need to know the vector register size in order to properly design and optimize your data structures. Most advanced uses of e.g. AVX-512 are not just doing simple loop-unrolling style parallelism. They are doing non-trivial slicing and dicing of heterogeneous data structures in parallel. There are idioms that allow you to e.g. process unrelated predicates in parallel using vector instructions, effectively MIMD instead of SIMD. It enables use of vector instructions more pervasively than I think people expect but it also means you really need to know where the register boundaries are with respect to your data structures. History has generally shown that when it comes to optimization, explicitness is king. | ||
▲ | camel-cdr 3 days ago | parent | next [-] | |
> The variable length vectors are probably one of those ideas that sound good on paper but don’t work that well in practice I don't understand this take, you can still querry the vector length and have specialized implementations if needed. But the vast majority of cases can be written in a VLA way, even most advanced ones imo. E.g. here are a few things that I know to work well in a VLA style: simdutf (upstream), simdjson (I have a POC), sorting (I would still specialize, but you can have a fast generic fallback), jpeg decoding, heapify, ... | ||
▲ | positron26 a day ago | parent | prev [-] | |
This might be a case where -mtune and -march or just runtime patching become more important |