Remix.run Logo
janwas 5 months ago

On vqsort: yes, the current RVV set of shuffles is awfully limited and several implementations produce one element per cycle. We also saw excessive VSETVLI, though I understand that has been fixed by an extra compiler pass. Could be interesting to retry with a uarch having O(1) shuffles.