Remix.run Logo
dataflow 3 hours ago

The question isn't whether they both take a clock cycle, but rather whether any future implementation of the ISA might ostensibly find some sort of performance advantage, even if none do right now. From that standpoint, xor seems like a safer bet.

Symmetry an hour ago | parent | next [-]

There's been a lot of churn over the years but additions being done in the same timeframe as XORs has been pretty constant. The Pentium 4 double pumped its ALU but both XORs and ADDs could happen in a half cycle latency. The POWER 6 cut the FO4s of latency in stage from 16 to 10 and kept that parity as well. When you need 2 FO4s for latching between stages and 2 to handle clock jitter at high frequencies the difference between what a XOR needs and what an ADD need start looking smaller, particularly when you include the circuitry to move the data and select the instruction. Maybe if we move to asynchronous circuits?

vablings 2 hours ago | parent | prev [-]

Defacto standard, Compilers optimize for the CPU, CPU uarch is now optimizing for compilers