Remix.run Logo
addaon 3 days ago

There's another difference -- willingness to actually pay for silicon. The M1 Max is a 432 mm^2 laptop chip built on a 5 nm process. Contrast that to AMD's "high end" Ryzen 7 8845HS at 178 mm^2 on a 4 nm process. Even the M1 Pro at 245 mm^2 is bigger than this. More area means not just more peak performance, but the ability to use wider paths at lower speeds to maintain performance at lower power. 432 mm^2 is friggin' huge for a laptop part, and it's really hard to compete with what that can do on any metric besides price.

MindSpunk 3 days ago | parent | next [-]

Comparing the M1 Max to a Ryzen 7 8845HS is not a fair comparison because the M1 chip also includes a _massive_ GPU tile, unlike the 8845HS which has a comparatively tiny iGPU because most vendors taking that part are pairing them with a separate dGPU package.

A better comparison is to take the total package area of the AI Max+ 395 that includes a 16 core CPU + a massive GPU tile and you get ~448mm^2 across all 3 chiplets.

tracker1 3 days ago | parent | prev | next [-]

Apple's SOC does a bit more than AMD's, such as including the ssd controller. I don't know if Apple is grafting different nodes together for chiplets, etc compared to AMD on desktop.

The area has nothing to do with peak performance... based on the node, it has to do with the amount of components you can cram into a given space. The CRAY-1 cpu was massive compared to both of your examples, but doesn't come close to either in terms of performance.

Also, Ryzen AI Max+ 395 is top dog on the AMD mobile CPU front and is around 308mm^2 combined.

addaon 3 days ago | parent [-]

> The area has nothing to do with peak performance... based on the node, it has to do with the amount of components you can cram into a given space.

Of course it does. For single-threaded performance, the knobs I can turn are clockspeed (minimal area impact for higher speed standard cells, large power impact), core width (significant area impact for decoder, execution resources, etc, smaller power impact), and cache (huge area impact, smaller power impact). So if I want higher single-threaded performance on a power budget, area helps. And of course for multi-threaded performance the knobs I have are number of cores, number of memory controllers, and last-level cache size, all of which drive area. There's a reason Moore's law was so often interpreted as talking about performance and not transistor count -- transistor count gives you performance. If you're willing to build a 432 mm^2 chip instead of a 308 mm^2 chip iso-process, you're basically gaining a half-node of performance right there.

tracker1 3 days ago | parent [-]

Transistor count does not equal performance. More transistors isn't necessarily going to speed up any random single-threaded bottleneck.

Again, the CRAY-1 CPU is around 42000 mm^2, so I'm guessing you'd rather run that today, right?

gigatexal 3 days ago | parent | prev [-]

True the M1 Pro and Max chips were capable of 200GB/s and 400GB/s of bandwidth between the chip and the integrated memory. No desktop chips had such at the time I think.