Remix.run Logo
rsf 2 hours ago

> Sure, the code is strange, but it is not necessarily inefficient.

Out of the 6 pieces of Assembly code in the article, 2 of them are definitely inefficient - specifically, the 2 clang ones that contain irrelevant writes to the stack. Even if a CPU was smart enough to ignore those instructions with no performance penalty (which in itself is doubtful), at the very least those instructions take up space in memory/caches unnecessarily.

The gcc output when arraySize is 3 is almost certain to be inefficient as well, when you look at portions such as:

        mov     eax, 1
        test    eax, eax
        sete    al
        ret
All this code is doing is to set eax to 0 and then returning. This could be done by simply replacing it with "xor eax, eax ; ret" or "mov eax, 0 ; ret" if there's a reason to avoid "xor" - there's already a mov there. The code as present also has the side effect of changing the CPU's flags, but this side effect can't be relied on as we return immediately, and flag values are not part of the returned values with this ABI.

So yes, in general benchmarking is the only way to be sure. But when you look at the specifics of the generated code, we can see that at best 4 of the 6 snippets of Assembly code are optimal, and the actual number of optimal snippets is probably lower than 4 (my best guess is 2 here).

All that said, I might benchmark everything later on and post a new article about it.

> Also worth mentioning in passing: if you are not compiling with --march=native, all your code is being optimized for some prehistoric ancient least-common-denominator Intel processor, probably a 1990's-era 486, that nobody actually has anymore that has god-only-knows what inadequacies in its execution pipeline. So make sure you are.

Yep: See https://news.ycombinator.com/item?id=46978577