▲ | dzaima 2 months ago | |
> The great joy of basic x86 encoding is that you don't actually need to put things in registers to operate on them. That's... 1 register saved, out of 16 (or 32 on AVX-512). Perhaps useful sometimes, but far from a particularly significant aspect spill-wise. And doing that means you lose the ability to move the load earlier (perhaps not too significant on OoO hardware, but still a consideration; while reorder windows are multiple hundreds of instructions, the actual OoO limit is scheduling queues, which are frequently under a hundred entries, i.e. a couple dozen cycles worth of instructions, at which point the ≥4 cycle latency of a load is not actually insignificant. And putting the load directly in the arith op is the worst-case scenario for this) |