| ▲ | flohofwoe 3 hours ago | |
> 1000x in AVX512+days of thought compared to the naive version written in a python loop Out of this 1000x speedup you get 100x by just not using python though ;) Also IIRC the main problem specifically with AVX512 was that mainstream CPUs simply didn't have it, so a smart compiler won't be of much use when the output code only runs on a handful devices. | ||