> used to be a crown-jewel of US tech

I feel like x86 itself is kinda legacy tech. So while AMD has made advancements, they're somewhat in the same boat as Intel.

It seems like NVIDIA and Micron are the real "crown jewels" of US tech

▲ sho_hn 5 days ago | parent | next [-]

Tech-wise places too much premium on the ISA. Modern processor design is fairly orthogonal to the ISA being exposed.

Intel could make exciting RISC-V relatively quickly if they wanted to; what stops them and other companies like this is the strategic asset they perceive their existing ecosystem as.

	▲	protimewaster 4 days ago \| parent \| next [-]
		There's a nice interview with Mike Clark where he talks about this a bit. His take basically matches this. He says that, in his view, any efficiency benefits of ARM are just that's been the market for ARM. In his view, if x86 had a market motive for ARM levels of efficiency, they'd be able to deliver it. But, historically, the x86 market wants performance more than efficiency, so that's what it gets. https://www.computerenhance.com/p/an-interview-with-zen-chie...
	▲	codedokode 5 days ago \| parent \| prev [-]
		I don't think so. For example, if an ISA requires a strict memory ordering, this makes the architecture more complicated than an ISA with relaxed memory ordering, although the latter is a pain to write code for.

▲ tester756 5 days ago | parent | prev | next [-]

ISA is irrelevant

It's like saying that programming language syntax/keywords are better than the other.

Everything is about compiler, lib, runtime, etc.

https://chipsandcheese.com/p/arm-or-x86-isa-doesnt-matter

Also some people say that RISC-V is the way to go

▲ riehwvfbk 5 days ago | parent [-]

And yet Itanium flopped.

▲ mort96 5 days ago | parent | next [-]

Itanium is irrelevant to this discussion. x86 works the same as its ARM and RISC-V competitors: a fairly compact, abstract language which describes a program, which depends on an instruction decoder to translate the abstract instructions into microarchitecture-specific instructions. VLIW is a huge departure from that.

When people say "ISA doesn't matter", they mean that the "legacy cruft" in x86 doesn't matter (that much) and that x86 remains competitive with other similar ISAs. It doesn't mean that the difference between VLIW and traditional ISAs doesn't matter. ISA paradigm still matters, just not the "syntax".

▲ ajross 5 days ago | parent | prev | next [-]

But not because of its ISA. I mean, to first approximation everything is a "flop" in semiconductor architectures (or really in tech in general). The population of genuinely successful products is a tiny fraction of the stuff people tried to sell.

In this particular case: ia64 leaned hard into wide VLIW in an era where growing transistor budgets made it possible to decode and issue traditional instructions in parallel[1]. The Itaniums really were fine CPUs, they just weren't particularly advantageous relative to the P6 cores against which they were competing, so no one bought them.

[1] In some sense, VLIW won as a matter of pipeline architecture, it only lost as a design point in ISA specs. Your Macbook is issuing 10 arm64 instructions every cycle, and it doesn't need to futz with the instruction format to do it.

▲ wbl 5 days ago | parent [-]

VLIW came with an implication that static scheduling would win out. The deeply OoO chips you see now have a very different architecture to support that: Itanium was much more a DSP like thing.

▲ ajross 5 days ago | parent | next [-]

Even in VLIW, DRAM fetches are slow, instructions have variable latency and write-before-retire register collisions require renaming. Itanium would have gotten there at some point. OO isn't an optional feature for high performance systems and that was clear even in the 90's.

▲ wbl 5 days ago | parent [-]

If you have that what's the VLIW getting you?

▲ ajross 5 days ago | parent | next [-]

Fewer transistors and pipeline stages required for the decode unit, which is a real but moderate advantage. And it turned out the window was very narrow and the relative win got smaller and smaller over time. And other externalities where VLIW loses moderately, like total instruction size (i.e. icache footprint) turned out to be more important.

▲

cesarb 5 days ago | parent [-]

> Fewer transistors and pipeline stages required for the decode unit, which is a real but moderate advantage.

Isn't having fixed-size naturally-aligned instructions (like on 64-bit ARM) enough to get that advantage?

▲

ajross 5 days ago | parent [-]

ARM is easier than x86, but not really. VLIW instructions also encode the superscalar pipeline assignments (or a reasonable proxy for them) and are required to be constructed without instruction interdependencies (within the single bundle, anyway), which traditional ISAs need to spend hardware to figure out.

Really VLIW is a fine idea. It's just not that great an idea, and in practice it wasn't enough to save ia64. But it's not what killed it, either.

	▲	codedokode 4 days ago \| parent [-]
		The problem with ia64 was that if you had 1000 legacy applications for x86, written by third-party contractors, for many of which you don't even have the source, then ia64 must be 100x better than standard CPUs to justify rewriting the apps. And by the way that's why open source makes such migrations much cheaper.

▲ codedokode 5 days ago | parent | prev [-]

Out-of-order architectures are inhumanly complex, especially figuring out the dependencies. For example, can we reorder these two instructions or must execute them sequentially?

    ld r1, [r2 + 10]
    st [r3 + 4], r4

And then consider things like speculative execution.

▲

1718627440 3 days ago | parent | next [-]

Honestly to me it seams like optimizing compilers and out-of order CPUs are actually doing the same thing. Can't we get rid of one or the other?

Either have a stupid ISA and do all the work ahead-of-time with way more compute time to optimize or don't optimize and have a higher level ISA, that also hs concepts like pointer provenance.

The current state seams like a local minima with both having ahead-of-time optimization, but the ISA does it's thing anyways and also the compiler throwing much of the information away with OoO analysis being time-critical.

	▲	wbl 3 days ago \| parent [-]
		The compiler doesn't know the dynamic state of the CPU memory hierarchy and you don't want it to. Even the CPU doesn't know until it finds out how long a load will take. Meanwhile the CPU probably can't do a loop invariant hoist in a reasonable way or understand high level semantics.

▲

wbl 4 days ago | parent | prev [-]

But you already pay that price anyway.

▲ tadfisher 5 days ago | parent | prev [-]

If only that could have worked, then we could have avoided the whole Spectre/Meltdown mess and resulting mitigations.

▲ ben-schaaf 5 days ago | parent | prev | next [-]

By all accounts I can find Itanium performance was good, perhaps even great when writing assembly. It seems to reinforce the point that ISA doesn't really matter.

But let's be clear: Of course ISA matters. It's just as trivial to make a bad ISA as it is a bad syntax. But does the ISA of modern superscalar processors matter? Probably a bit, but certainly not a whole lot.

	▲	dboreham 5 days ago \| parent [-]
		It wasn't good vs peer competitors at the time (HP-PA, DEC Alpha, IBM RS/6000, even MIPS). And it was very expensive. Huge die. It was an expensive, strange thing, that didn't have the necessary 2X peer performance advantage to offset those issues.

▲ 5 days ago | parent | prev | next [-]

[deleted]

▲ lallysingh 5 days ago | parent | prev [-]

They required unreasonable things from the compiler for instruction scheduling.

▲ sapiogram 5 days ago | parent | prev | next [-]

> I feel like x86 itself is kinda legacy tech.

The impact of ISA is overrated, it's much more important that the ISA continues to grow and adapt as CPUs get larger.

▲ FuriouslyAdrift 5 days ago | parent | prev [-]

modern x86 chips (for a long time really) are hybrid CISC/RISC at the hardware level. It's at the microcode that the ISA lives and that's changeable.

▲

cesarb 5 days ago | parent [-]

> It's at the microcode that the ISA lives and that's changeable.

No, it's not. In modern high-speed CPUs, many instructions are decoded directly, without going through the microcode engine. In fact, on several modern Intel CPUs, only one of the instruction decoders can run microcode ("complex") instructions, while all the other decoders can only run non-microcode ("simple") instructions.

It would be more precise to say that it's at the "front-end" part of the core (where the decoders are) that the ISA lives, but even that's not quite true; many ISAs have peculiarities which affect beyond that, like flags on x86.

	▲	FuriouslyAdrift 5 days ago \| parent \| next [-]
		It was my understanding that even direct coded instructions are still translated by the microcode into the actual signals to allow for errata patching since the P6 architecture and to maintain a common ISA target within a family of processors with diffferent physical characteristics.
	▲	FuriouslyAdrift 5 days ago \| parent \| prev [-]
		I think I am conflating micro-ops with microcode and your above comment is the correct way of thinking about it.