▲ | camel-cdr 2 months ago | ||||||||||||||||
I quite like highway. As mentioned, last time I tried vqsort for RVV it was surprisingly slow. I tried to replicate it yesterday, but noticed that vqsort is now disabled for RVV: https://github.com/google/highway/blob/400fbf20f2e40b984be12... Does highway support sorting networks for non-128-bit vector registers? When I tried to compile it for AVX512, the BaseCase seems to only use xmm registers: https://godbolt.org/z/qr9xoTGKn | |||||||||||||||||
▲ | janwas 2 months ago | parent [-] | ||||||||||||||||
:) Yes, vqsort recently tickled a bug in clang. I've seen a steady stream of issues, many caused by SLP or the seeming absence of CI. You might try re-enabling it on GCC. Yes, the issue with the sorting network is that it is limited to 16x16 to reduce code explosion. With uint16_t, XMM are sufficient for the 8-column case; your Godbolt link does have some YMM for the 16-column case. When changing the type to sort to uint32_t, we see ZMM as expected. | |||||||||||||||||
|