Remix.run Logo
dzaima 2 months ago

> The great joy of basic x86 encoding is that you don't actually need to put things in registers to operate on them.

That's... 1 register saved, out of 16 (or 32 on AVX-512). Perhaps useful sometimes, but far from a particularly significant aspect spill-wise.

And doing that means you lose the ability to move the load earlier (perhaps not too significant on OoO hardware, but still a consideration; while reorder windows are multiple hundreds of instructions, the actual OoO limit is scheduling queues, which are frequently under a hundred entries, i.e. a couple dozen cycles worth of instructions, at which point the ≥4 cycle latency of a load is not actually insignificant. And putting the load directly in the arith op is the worst-case scenario for this)