Remix.run Logo
Voultapher a day ago

Doing the phf as shown is an and + neg instruction and just doing % 4 is just the and. I tested it on a Apple M1 machine and saw no difference in performance at all. It's possible to go much faster with vectorization 3x on the Zen 3 machine.

Sesse__ a day ago | parent [-]

I didn't say it was slower, just that it was more obfuscated.