Remix.run Logo
kimixa 5 hours ago

That's no guarantee it would succeed though - AMD64 also cleaned up a number of warts on the x86 architecture, like more registers.

While I suspect the Intel equivalent would do similar things, simply from being a big enough break it's an obvious thing to do, there's no guarantee it wouldn't be worse than AMD64. But I guess it could also be "better" from a retrospective perspective.

And also remember at the time the Pentium 4 was very much struggling to get the advertised performance. One could argue that one of the major reasons that the AMD64 ISA took off is that the devices that first supported it were (generally) superior even in 32-bit mode.

EDIT: And I'm surprised it got as far as silicon. AMD64 was "announced" and the spec released before the pentium 4 was even released, over 3 years before the first AMD implementations could be purchased. I guess Intel thought they didn't "need" to be public about it? And the AMD64 extensions cost a rather non-trivial amount of silicon and engineering effort to implement - did the plan for Itanium change late enough in the P4 design that it couldn't be removed? Or perhaps this all implies it was a much less far-reaching (And so less costly) design?

ghaff 3 hours ago | parent | next [-]

As someone who followed IA64/Itanium pretty closely, it's still not clear to me the degree to which Intel (or at least groups within Intel) thought IA64 was a genuinely better approach and the degree to which Intel (or at least groups within Intel) simply wanted to get out from existing cross-licensing deals with AMD and others. There were certainly also existing constraints imposed by partnerships, notably with Microsoft.

ajross 3 hours ago | parent [-]

Both are likely true. It's easy to wave it away in hindsight, but there was genuine energy and excitement about the architecture in its early days. And while the first chips were late and on behind-the-cutting-edge processes they were actually very performant (FPU numbers were world-beating, even -- parallel VLIW dispatch really helped here).

Lots of people loved Itanium and wanted to see it succeed. But surely the business folks had their own ideas too.

kimixa 3 hours ago | parent [-]

Yes - VLIW seems to lend itself to computation-heavy code, used to this day in many DSP (and arguably GPU, or at least "influences" many GPU) architectures.

chasil 3 hours ago | parent | prev [-]

The times that I have used "gcc -S" on my code, I have never seen the additional registers used.

I understand that r8-r15 require a REX prefix, which is hostile to code density.

I've never done it with -O2. Maybe that would surprise me.

astrange 3 hours ago | parent | next [-]

You should be able to see it. REX prefixes cost a lot less than register spills do.

If you mean literally `gcc -S`, -O0 is worse than not optimized and basically keeps everything in memory to make it easier to debug. -Os is the one with readable sensible asm.

chasil 3 hours ago | parent [-]

Thanks, I'll give it a try.

o11c 2 hours ago | parent | prev [-]

Obviously it depends on how many live variables there are at any point. A lot of nasty loops have relatively few non-memory operands involved, especially without inlining (though even without inlining, the ability to control ABI-mandated spills better will help).

But it's guaranteed to use `r8` and `r9` for for a function that takes 5 and 6 integer arguments (including unpacked 128-bit structs as 2 arguments), or 3 and 4 arguments (not sure about unpacking) for Microsoft. And `r10` is used if you make a system call on Linux.