Remix.run Logo
svnt 11 hours ago

His point is that in x86 there is no performance difference but everyone except his colleague/friend uses xor, while sub actually leaves cleaner flags behind. So he suspects its some kind of social convention selected at random and then propagated via spurious arguments in support (or that it “looks cooler” as a bit of a term of art).

It could also be as a result of most people working in assembly being aware of the properties of logic gates, so they carry the understanding that under the hood it might somehow be better.

zahlman 6 hours ago | parent | next [-]

GP seems to think it strange that "x86" would actually not have a performance difference here.

I think this might just be due to not realizing just how far back in CPU history this goes.

wongarsu 5 hours ago | parent [-]

In a clockless cpu design you'd indeed expect xor to be faster. But in a regular CPU with a clock you either waste a bit of xor performance by making xor and sub both take the same number of ticks, or you speed up the clock enough that the speed difference between xor and sub justifies sub being at least a full tick slower

The former just seems way more practical

dbdr 4 hours ago | parent [-]

Even if they take the same number of ticks, shouldn't xor fundamentally needing less work also mean it can be performed while drawing less power/heating less, which is just as much an improvement in the long run?

MBCook 3 hours ago | parent [-]

That wasn’t much of a concern in the 70s and 80s.

3form 10 hours ago | parent | prev [-]

I think an even more likely explanation would be that x86 assembly programmers often were, or learned from other-architecture assembly programmers. Maybe there's a place where it makes more sense and it can be so attributed. 6502 and 68k being first places I would look at.

richrichardsson 10 hours ago | parent | next [-]

For 68k depending on the size you're interested in then it mostly doesn't matter.

.b and .w -> clr eor sub are all identical

for .l moveq #0 is the winner

bonzini 5 hours ago | parent | prev [-]

6502 doesn't even have register-to-register ALU operations, there's no alternative to LDA #0.

8080/Z80 is probably where XOR A got a lead over SUB A, but they are also the same number of cycles.