Remix.run Logo
shawn_w 10 hours ago

Quite a few architectures have a dedicated 0 register.

monocasa 16 minutes ago | parent | next [-]

Very few architectures have a NAT bit though.

repelsteeltje 10 hours ago | parent | prev | next [-]

Yep. The XOR trick - relying on special use of opcode rather than special register - is probably related to limited number of (general purpose) registers in typical '70 era CPU design (8080, 6502, Z80, 8086).

classichasclass 5 hours ago | parent | next [-]

Unfortunately, 6502 can't XOR the accumulator with itself. I don't recall if the Z80 can, and loading an immediate 0 would be most efficient on those anyway.

blywi 5 hours ago | parent | next [-]

XOR A absolutely works on Z80 and it's of course faster and shorter than loading a zero value with LD A,0. LD A,0 is encoded to 2 bytes while XOR A is encoded as a single opcode. XOR A has the additional benefit to also clear all the flags to 0. Sub A will clear the accumulator, but it will always set the N flag on Z80.

eichin 15 minutes ago | parent | next [-]

Yeah, the article seems to have missed the likely biggest reason that this is the popular x86 idiom - that it was already the popular 8080/Z80 idiom from the CP/M era, and there's a direct line (and a bunch of early 8086 DOS applications were mechanically translated assembly code, so while they are "different" architectures they're still solidly related.)

classichasclass 4 hours ago | parent | prev [-]

Ah, thanks, I couldn't recall off the top of my head.

repelsteeltje 4 hours ago | parent | prev | next [-]

You're absolutely right, I stand corrected.

The 6502 gets by doing immediate load: 2 clock cycles, 2 bytes (frequently followed by single byte register transfer instruction). Out of curiosity I did a quick scan of the MOS 1.20 rom of the BBC micro:

  LDY #0 (a0 00): 38 hits
  LDX #0 (a2 00): 28 hits
  LDA #0 (a9 00): 48 hits
bonzini 5 hours ago | parent | prev [-]

The Z80 can do either LD A,0 or SUB A or XOR A, but the LD is slower due to the extra memory cycle to load the second byte of the instruction.

wongarsu 5 hours ago | parent | prev | next [-]

And [as mentioned in the article] even modern x86 implementations have a zero register. So you have this weird special opcode that (when called with identical source and destination) only triggers register renaming

bonzini 5 hours ago | parent | prev [-]

A move on SPARC is technically an OR of the source with the zero register. "move %l0, %l1" is assembled as "or %g0, %l0, %l1". So if you want to zero a register you OR %g0 with itself.

lynguist 10 hours ago | parent | prev | next [-]

Indeed!!

MIPS - $zero

RISC-V - x0

SPARC - %g0

ARM64 - XZR

classichasclass 5 hours ago | parent | next [-]

PowerPC: "r0 occasionally" (with certain instructions like addi, though this might be better considered an edge case of encoding)

Findecanor 2 hours ago | parent | prev | next [-]

On 64-bit ARM, the same register number is XZR in some instructions and the stack pointer in others.

matja 3 hours ago | parent | prev [-]

Alpha: r31, f31

signa11 10 hours ago | parent | prev [-]

indeed. riscv for instance. also, afaik, xor’ing is faster. i would assume that someone like mr. raymond would know…

IshKebab 10 hours ago | parent | next [-]

> afaik, xor’ing is faster

Even tiny tiny CPUs can do sub in one cycle, so I doubt that. On super-scalar CPUs xor and sub are normally issued to the same execution units so it wouldn't make a difference there either.

tliltocatl 10 hours ago | parent [-]

On superscalars running xor trick as is would be significantly slower because it implies a data dependency where there isn't one. But all OOO x86's optimize it away internally.

pif 10 hours ago | parent | prev [-]

Which part of "mathematical operations don’t reset the NaT bit" did you not understand?