Remix.run Logo
usamoi 5 hours ago

This code is not equivalent to the C++ version. You can directly use `*x == [0_u32; SIZE]`. The code generated by the two is different. (But the iterator version not producing optimal code is also an issue.)

gspr 5 hours ago | parent [-]

Very good point! Thanks!

With the correction, it interestingly enough produces the good behavior also at size=2. It also delays SIMD until size=5. But then it bizarrely stops doing SIMD again after size=64.

https://godbolt.org/z/P979nY4nf

The iterator version stays SIMD-y also after size=64, but stops at some point. What?! I don't know enough to understand what's going on. Anyone?

ceteia an hour ago | parent [-]

Might it have something to do with compiler heuristics? Compilers cannot analyze everything, since the more compilation time is spent on analysis, the slower compilation will be. So compilers might use heuristics to guess when to do analysis and how much, and when not to. As part of trade-offs between better optimization of compiled programs and decreasing compilation times.