Remix.run Logo
dlcarrier 2 hours ago

It would probably run really fast, considering that Itanium's downfall was the difficulty in compiling. (Including translating x86 instructions into Itanium instructions)

tliltocatl 2 hours ago | parent [-]

Not really. Itanium was a result of some people at Intel being obsessed by LINPACK benchmarks and forgetting everything else. It sucked for random memory access, and hence everything that's not floating-point number-crunching. Compiler can't hide memory access latency because it's fundamentally unpredictable. VLIW does magic for floating-point latency (which is predictable), but

- As transistors got smaller, FP performance increased, memory latency stayed the same (or even increased).

- If you are doing a lot of floating point, you are probably doing array processing, so might as well go for a GPU or at least SIMD).

- Low instruction density is bad for I-cache. Yes, RISC fans, density matters! And VLIW is an absolute disaster in that regard. Again, this is less visible in number-crunching loads where the processor executes relatively small loops many times over.

fjjfnrnr an hour ago | parent [-]

Naive question: shouldn't vliw be beneficial to memory access, since each instruction does quite a lot of work, thus giving the memory time to fetch the next instruction?

tliltocatl 29 minutes ago | parent [-]

- Even each instruction does a lot of work, it is supposed to do it in parallel, so time available to fetch the next instruction is (supposed to be) the same.

- Not everything is parallelisable so most of instructions words end up full of NOPs.

- The real problem are data reads. Instruction fetches are fairly predictable (and when they aren't OOO suck just as much), data reads aren't. An OOO can do something else until the data comes in. VLIV, or any in-order architecture, must stall as soon as a new instruction depends on the result of the read.