▲ | gpderetta 12 hours ago | |
The looping overhead is trivial, especially on simd code where the loop overhead will use the scalar hardware. Unrolling is definitely needed for properly scheduling and pipelining SIMD code even on OoO cores. Remember that an OoO core cannot reorder dependent instructions, so the dependencies need to be manually broken, for example by adding additional accumulators, which in turn requires additional unrolling, this is especially important on SIMD code which typically is non-branchy with long dependency chains. | ||
▲ | Remnant44 7 hours ago | parent [-] | |
That's a good point about increased dependency chain length in simd due to the branchless programming style. Unrolling to break a loop-carried dependency is one of the strongest reasons to unroll especially simd code. Unrolling trivial loops to remove loop counter overhead hasn't been productive for quite a whole now but unfortunately it's still the default for many compilers. |