▲ | Sesse__ 8 months ago | |
The place where I see this really hurts goes when Clang/LLVM gets too fancy, in situations like this:
Boom, store-to-load forwarding failure, and a bad stall. E.g., the Zen series seem to be really bad at this (only tried up to Zen 3), but there are pretty much no out-of-order CPUs that handle this without some kind of penalty. | ||
▲ | ack_complete 8 months ago | parent [-] | |
This happens with partial autovectorization, too. Compiler fails to vectorize a first loop and then vectorizes the second, result is a store forwarding failure at the start of the second loop trying to read the output of the first loop, erasing the vectorization gains. |