▲ | janwas 2 months ago | |
Nice work :) Clang x86 indeed unrolls, which is good. But setting the CC and AA mask constants looks fairly expensive compared to fixed-pattern shuffles. Yes, the 2D aspect of the sorting network complicates things. Transposing is already harder to make VLA and fusing it with the other shuffles certainly doesn't help. |