Remix.run Logo
mysterion1 13 days ago

Wouldn't it be simpler for Intel to have designed a chip, with those 8 identical instructions (xfer, shift, add, arith, far jmp, far call, local jmp, misc), but read/executed from normal RAM accessible by the user, perhaps with a tiny cache, instead all these ROM/microcode special compression/hidden architecture shenanigans?

adrian_b 13 days ago | parent | next [-]

This is exactly the theory of the RISC and VLIW processors, which replaced, respectively, the vertical microprograms and the horizontal microprograms stored in ROMs, which were used in the processors of the seventies, with normal programs with simple instructions, which were normally executed from fast cache memories, thus achieving the same speed as microprograms.

However, when the 8087 was designed, RISC and VLIW processors were still in the future, because a fast cache memory allowing the execution of an instruction per clock cycle was still far too expensive in comparison with a microprogram ROM.

Most earlier floating-point accelerators were microprogrammed like 8087, with the microprograms stored in a ROM. However, there existed FPS AP-120B, introduced by the company Floating Point Systems in 1976. This was a floating-point accelerator for minicomputers, like DEC PDP-11 or VAX, which was marketed as a "supercomputer for the poor".

FPS AP-120B was a VLIW processor launched 7 years before the term "VLIW" was coined. This means that it was a horizontally microprogrammed processor (i.e. with multiple concurrent operations specified by each microinstruction), where the microprogram was not stored in a ROM, but it was fed into the accelerator by the host computer. Therefore the user could write directly such microprograms for it, to implement optimized computational algorithms.

Nevertheless, while FPS AP-120B was said to be a "supercomputer for the poor", "poor" was meant only in comparison with those who could afford to buy a Cray-1. Such a "cheap" array processor still had a price more than 100 times greater than an Intel 8087.

By the time when RISC and VLIW CPUs became fashionable, using microinstructions as simple as those of Intel 8087 for implementing floating-point operations was no longer acceptable, because having to execute tens or hundreds of simple instructions for each FP operation was deemed too slow. Therefore the instruction sets of RISC and VLIW CPUs were eventually extended to include FP operations as single instructions, which had to be implemented in complex hardware in order to achieve an execution throughput of one instruction per clock cycle.

dexen 13 days ago | parent [-]

Excellent post, thank you

kens 13 days ago | parent | prev | next [-]

That's basically the RISC approach, using simple one-clock instructions instead of complex microcoded instructions. In the case of the 8087, it made sense to use microcode because the 8087 is running in parallel with the regular 8086 processor. If the 8087 is constantly fetching micro-instructions from RAM, it will get in the way of the 8086. (Note that RISC chips rapidly added floating-point units, even though that goes against the strict RISC ideology.)

userbinator 13 days ago | parent | next [-]

This is also why RISC would never have happened if it weren't for the fact that, for a brief period in the history of computing, RAM was faster than the core. Single-cycle instructions only make sense if the fetch can keep up.

cyberax 13 days ago | parent [-]

I don't think this is correct? You can certainly fetch more than one cycle's worth of data each cycle even on modern RAM.

It's a question of throughput that can be extended cheaply enough.

gblargg 12 days ago | parent | next [-]

I'm reading that L1 takes 4-5 cycles to read on modern CPUs, whereas it was just one cycle in the late 1980s.

userbinator 13 days ago | parent | prev [-]

I was referring to the 80s when RISCs were first invented.

mysterion1 13 days ago | parent | prev [-]

Judging by the register area bit density, it seems it would have space for 3-5kbit SRAM cache (replacing the 26,368 bit ROM). I wonder if the basic 4 ops+some approximation functions like sqrt would fit in there. Purely alternative history ;)

russdill 13 days ago | parent | prev [-]

You're them moving and storing a lot of repetitive instruction data.