▲ | Voultapher a day ago | |
Doing the phf as shown is an and + neg instruction and just doing % 4 is just the and. I tested it on a Apple M1 machine and saw no difference in performance at all. It's possible to go much faster with vectorization 3x on the Zen 3 machine. | ||
▲ | Sesse__ a day ago | parent [-] | |
I didn't say it was slower, just that it was more obfuscated. |