Remix.run Logo
sidewndr46 5 hours ago

Not to suggest you weren't competent, but did you consider and try and control for the fact that your measurement could be the problem?

magicalhippo 5 hours ago | parent [-]

Not going to dismiss it, but I did try to not do stupid stuff. I used QueryPerformanceCounter outside the loop, pinned the benchmark thread to a single core, and the array of elements it processed was fairly large. So I don't think overhead and throttling was an issue. The measurements were very consistent and repeatable.

sidewndr46 4 hours ago | parent [-]

Fair enough, I've only really ever found assembly level optimization on embedded microcontrollers to make any degree of sense. Performance optimization usually means something along the lines of "convince co-workers not to implement their own bubble sort" in my lines of work

magicalhippo 2 hours ago | parent [-]

Yeah, I've also come across a lot of assembly code which was faster 10 years ago, but where the compiler now beats it. So for a while now my take has been to mostly avoid asm, but if needed always have a compiled version, and always do runtime performance detection to select optimal version.