The non-obvious bit is why there isn't an even faster and shorter "mov <register>,0" instructions - the processors started short-circuiting xor <register>,<register> much later.

▲

defmacr0 5 hours ago | parent | next [-]

In x86, a basic immediate instruction with a 1 Byte immediate value is encoded like this:

<op> (1 Byte opcode), <Registers> (1 Byte), <immediate value> (1 Byte)

While xor eax, eax only uses 2 bytes. Since there are only 8 registers, meaning they can be encoded with 3 bits, you can pack two values into the <Registers> field (ModR/M).

Making mov eax, 0 only take two bytes would require significant changes of the ISA to allow immediate values in the ModR/M byte (or similar) but there would be little benefit since zeroing can already be done in 2 bytes and I doubt that other cases are even close to frequent enough for this to be any significant benefit overall. An actual improvement would be if there was a dedicated 1 Byte set-rax-to-0 instruction, but obviously that comes at a tradeoff where we have to encode another operation differently (probably with more bytes) again (and you can't zero anything else with it).

https://wiki.osdev.org/X86-64_Instruction_Encoding

https://pyokagan.name/blog/2019-09-20-x86encoding/

	▲	rep_lodsb 3 hours ago \| parent [-]
		Some other architectures like PDP-11 and 680x0 had a dedicated "clear register" instruction. It could have been added to x86, even as a group of single-byte opcodes with the register encoded in three bits (as with PUSH, POP, and INC/DEC outside of long mode). But the XOR idiom was already established on the 8080 by that point.

▲

HarHarVeryFunny 5 hours ago | parent | prev | next [-]

A number of the RISC processors have a special zero register, giving you a "mov reg, zero" instruction.

Of course many of the RISC processors also have fixed length instructions, with small literal values being encoded as part of the instruction, so "mov reg, #0" and "mov reg, zero" would both be same length.

▲

drob518 5 hours ago | parent | prev | next [-]

Right, like a “set reg to zero” instruction. One byte. Just encodes the operation and the reg to zero. I’m surprised we didn’t have it on those old processors. Maybe the thinking was that it was already there: xor reg,reg.

▲

bonzini 5 hours ago | parent | next [-]

One byte instructions, with 8 registers as in the 8086, waste 8 opcodes which is 3% of the total. There are just five: "INC reg", "DEC reg", "PUSH reg", "POP reg", "XCHG AX, reg" (which is 7 wasted opcodes instead of 8, because "XCHG AX, AX" doubles as NOP).

One-byte INC/DEC was dropped with x86-64, and PUSH/POP are almost obsolete in APX due to its addition of PUSH2/POP2, leaving only the least useful of the five in the most recent incantation of the instruction set.

▲

drob518 5 hours ago | parent [-]

I’m not sure I understand what you mean by “waste 8 opcodes.”

▲

GuB-42 4 hours ago | parent | next [-]

There are only 256 1-byte opcodes or prefixes available, if you take 8 of these to zero registers, they won't be available for other instruction, and unless you consider zeroing to be so important that they really need their 1-byte opcodes, it is redundant since you can use the 2-byte "xor reg,reg" instead, hence the "waste'.

In addition, you would need 16 opcodes, not 8, if you also wanted to cover 8 bit registers (AH/AL,...).

Special shout-out to the undocumented SALC instruction, which puts the carry flag into AL. If you know that the carry will be 0, it is a nice sizecoding trick to zero AL in 1 byte.

▲

bonzini 4 hours ago | parent | prev | next [-]

They occupy 8 of the possible 256 byte values. Together, those five cases used about 15% of the space.

Though I was forgetting one important case: MOV r,imm also used one-byte opcodes with the register index embedded. And it came in byte and word variants, so it used a further 16 opcodes bytes for a total of 56 one byte opcodes with register encoding.

	▲	drob518 an hour ago \| parent [-]
		Gotcha, thanks for clarifying. I was reacting to the word “waste” I guess. Surely, as you say, it consumes that opcode encoding space. Whether that’s a waste or not depends on a lot of other things, I suppose. I wasn’t necessarily thinking x86-specific in my original comment. But yea, if you try to zero every possible register and half-word register you would definitely consume lots of encoding space.

▲

LegionMammal978 4 hours ago | parent | prev | next [-]

Traditionally in x86, only the first byte is the opcode used to select the instruction, and any further bytes contain only operands. Thus, since there exist 256 possible values for the initial byte, there are at most 256 possible opcodes to represent different instructions.

So if you add a 1-byte instruction for each register to zero its value, that consumes 8 of the possible 256 opcodes, since there are 8 registers. Traditional x86 did have several groups of 1-byte instructions for common operations, but most of them were later replaced with multibyte encodings to free up space for other instructions.

▲

gpderetta 4 hours ago | parent | prev [-]

special mov 0 instruction times 8 registers. The opcode space, especially 1 byte opcode space, is precious so encoding redundant operations is wasteful.

▲

flohofwoe 5 hours ago | parent | prev [-]

Instruction slots are extremely valuable in 8-bit instruction sets. The Z80 has some free slots left in the ED-prefixed instruction subset, but being prefix-instructions means they could at best run at half speed of one-byte instructions (8 vs 4 clock cycles).

▲

10 hours ago | parent | prev [-]

[deleted]