Remix.run Logo
janwas 2 months ago

Nice work :) Clang x86 indeed unrolls, which is good. But setting the CC and AA mask constants looks fairly expensive compared to fixed-pattern shuffles.

Yes, the 2D aspect of the sorting network complicates things. Transposing is already harder to make VLA and fusing it with the other shuffles certainly doesn't help.