Remix.run Logo
tzs 2 days ago

Intel missed a very simple opportunity to vastly simplify memory models on the 80286 for software that ran in protected mode, such as OS/2 and various Unix or Unix-like systems.

In real mode memory addressing works as described in the article. A 2-byte segment number and a 2-byte offset are combined to produce the memory address. The translation from segment:offset to physical address is:

  physical_address = segment * 16 + offset
Note that you can't just treat segment:offset as a 32-bit value and add 1 to get the address of the next byte. When you treat a segment:offset as a 32-but address the space is not mapped linearly to physical addresses and that's the crux of what makes it annoying.

In protected mode the segment number is replaced with a selector. A selector is also 2-bytes but it is no longer just a single number. It is 3 fields:

• 13-bit selector number (SEL)

• 2 bit request privilege level (RL)

• 1 bit table indicator (T)

The way a selector:offset is translated to a physical address is:

• There are two "descriptor tables", the Local Descriptor Table (LDT) and the Global Descriptor Table (GDT). A descriptor is a data structure that contains the physical address of a block of memory, the length of the block, and some privilege information. The LDT is for memory of the current process, and the GDT is for memory shared by all processes such as the memory of the operating system.

• The selector number SEL is used as an index into one of those tables to find a descriptor. The table indicator bit T selects which table.

• The request privilege level RL is checked agains the privilege information from the descriptor, and the offset is checked against the length of the block described by the descriptor. If those checks pass then:

  physical_address = address_from_descriptor + offset
(The 80386 is similar except segments/selectors and offsets are 32-bits and if paging is enabled the address in a descriptor is a virtual address for the paging system rather than a physical address. Most operating system simply run everything in small model, and use the paging unit to do all their memory management).

Here's how they packed SEL, RL, and T into a 16-bit selector.

  +-------------+-+--+
  | selector    |T|RL|
  +-------------+-+--+
If you wanted to treat a selector:offset is a 32-bit value it looked like this:

  +-------------+-+--+----------------+
  | selector    |T|RL| offset         |
  +-------------+-+--+----------------+
Note that this still suffers from the same problem that made treating real mode segment:offsets as 32-bit values annoying. Adding 1 doesn't give you the next address when offset wraps.

If they had just laid out SEL, RL, and T a little differently in the selector they could have fixed that. Just put SEL in the least significant bits instead of the most significant bits:

  +--+-+-------------+----------------+
  |RL|T| selector    | offset         |
  +--+-+-------------+----------------+
Then if adding 1 to a pointer wraps offset to 0 is will increment SEL. As long as the operating system sets up the descriptor table so that the memory blocks describe by the descriptors do not overlap the program would see a 29-bit linear address space (30-bit if the T bit is next to SEL).

(If the OS needed to run a program that did need an address space with the kind of overlap that real mode has it could set up the LDT for the process so that the descriptors did describe overlapping memory blocks).

If Intel had done this compilers for 286 protected mode would have only needed small model and compiler writers, library writers, and programmers would have been much happier.

So why didn't they?

One guess I've heard is that since a descriptor table entry is 8 bytes, by putting SEL in the top bits of the selector and the other 3 bits worth of fields in the bottom they didn't have shift SEL to turn it into an offset from the base of the descriptor table. If SEL were at the bottom it would need to be shifted by 3 to make it an offset into a descriptor table.

I've talked to CPU designers (but none who worked on 80286) and they have told me for this kind of thing where you would always want to shift an input by a fixed amount building that shift in is essentially free, so that doesn't seem to be the explanation.

rep_lodsb 2 days ago | parent [-]

It's free in hardwired logic, but the descriptor tables are accessed by microcode. The 286 already had a constant ROM that could hold the FFF8h mask for an ALU AND operation, but no fast shift by multiple bits.

This might be one reason for it, but note that page table entries also use the lower bits for attributes. This means that software can use AND to separate the physical address and other bits, and OR to combine them, with no shift operation required.