▲ | camel-cdr 2 months ago | ||||||||||||||||||||||||||||||||||
The SLP vectorizer is a good point, but I think it's, in comparison with x86, more a problem of the float and vector register files not being shared (in SVE and RVV). You don't need to reconfigure the vector length; just use it at the full width. > Something like abseil's hash table If I remember this correctly, the abseil lookup does scale with vector length, as long as you use the native data path width. (albeit with small gains) There is a problem with vector length agnostic handling of abseil, which is the iterator API. With a different API, or compilers that could eliminate redundant predicated load/stores, this would be easier. > good for SIMT-like codes Certainly, but I've also seen/written a lot of vector length agnostic code using shuffles, which don't fit into the SIMT paradigm, which means that the scope is larger than just SIMT. --- As a general comparison, take AVX10/128, AVX10/256 and AVX10/512, overlap their instruction encodings, remove the few instructions that don't make sense anymore, and add a cheap instruction to query the vector length. (probably also instructions like vid and viota, for easier shuffle synthesization) Now you have a variable-length SIMD ISA that feels familiar. The above is basically what SVE is. | |||||||||||||||||||||||||||||||||||
▲ | janwas 2 months ago | parent [-] | ||||||||||||||||||||||||||||||||||
(For other readers:) This is what our Highway library does - wrapper functions around intrinsics, plus a (constexpr if possible) Lanes() function to query the length. For very many cases, writing the code once for an 'unknown to the programmer' vector length indeed works. One example that doesn't work so well is a sorting network; its size depends on the vector length. (I see you mention this below.) | |||||||||||||||||||||||||||||||||||
|