▲ | BirAdam 7 days ago | ||||||||||||||||||||||||||||||||||||||||||||||
First, Apple did an excellent job optimizing their software stack for their hardware. This is something that few companies have the ability to do as they target a wide array of hardware. This is even more impressive given the scale of Apple's hardware. The same kernel runs on a Watch and a Mac Studio. Second, the x86 platform has a lot of legacy, and each operation on x86 is translated from an x86 instruction into RISC-like micro-ops. This is an inherent penalty that Apple doesn't have pay, and it is also why Rosetta 2 can achieve "near native" x86 performance; both platform translate the x86 instructions. Third, there are some architectural differences even if the instruction decoding steps are removed from the discussion. Apple Silicon has a huge out-of-order buffer, and it's 8-wide vs x86 4-wide. From there, the actual logic is different, the design is different, and the packaging is different. AMD's Ryzen AI Max 300 series does get close to Apple by using many of the same techniques like unified memory and tossing everything onto the package, where it does lose is due to all of the other differences. In the end, if people want crazy efficiency Apple is a great answer and delivers solid performance. If people want the absolute highest performance, then something like Ryzen Threadripper, EPYC, or even the higher-end consumer AMD chips are great choices. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | zipityzi 7 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
This seems mostly misinformed. 1) Apple Silicon outperforms all laptop CPUs in the same power envelope on 1T on industry-standard tests: it's not predominantly due to "optimizing their software stack". SPECint, SPECfp, Geekbench, Cinebench, etc. all show major improvements. 2) x86 also heavily relies on micro-ops to greatly improve performance. This is not a "penalty" in any sense. 3) x86 is now six-wide, eight-wide, or nine-wide (with asterisks) for decode width on all major Intel & AMD cores. The myth of x86 being stuck on four-wide has been long disproven. 4) Large buffers, L1, L2, L3, caches, etc. are not exclusive to any CPU microarchitecture. Anyone can increase them—the question is, how much does your core benefit from larger cache features? 5) Ryzen AI Max 300 (Strix Halo) gets nowhere near Apple on 1T perf / W and still loses on 1T perf. Strix Halo uses slower CPUs versus the beastly 9950X below: Fanless iPad M4 P-core SPEC2017 int, fp, geomean: 10.61, 15.58, 12.85 AMD 9950X (Zen5) SPEC2017 int, fp, geomean: 10.14, 15.18, 12.41 Intel 285K (Lion Cove) SPEC2017 int, fp, geomean: 9.81, 12.44, 11.05 Source: https://youtu.be/2jEdpCMD5E8?t=185, https://youtu.be/ymoiWv9BF7Q?t=670 The 9950X & 285K eat 20W+ per core for that 1T perf; the M4 uses ~7W. Apple has a node advantage, but no node on Earth gives you 50% less power. There is no contest. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | JJJollyjim 7 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Apple CPUs do decode instructions into micro-ops. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | jcranmer 7 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
> Second, the x86 platform has a lot of legacy, and each operation on x86 is translated from an x86 instruction into RISC-like micro-ops. This is an inherent penalty that Apple doesn't have pay, and it is also why Rosetta 2 can achieve "near native" x86 performance; both platform translate the x86 instructions. Can we please stop with this myth? Every superscalar processor is doing the exact same thing, converting the ISA into the µops (which may involve fission or fusion) that are actually serviced by the execution units. It doesn't matter if the ISA is x86 or ARM or RISC-V--it's a feature of the superscalar architecture, not the ISA itself. The only reason that this canard keeps coming out is because the RISC advocates thought that superscalar was impossible to implement for a CISC architecture and x86 proved them wrong, and so instead they pretend that it's only because x86 somehow cheats and converts itself to RISC internally. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | rerdavies 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
The "RISC" thing is an experiment that failed in the 90s. There's nothing particularly RISCy about the ARM instruction set. It is a pretty darned complicated instruction set. ARM processors ALSO decode instructions to micro-ops. And Apple chips do too. Pretty much a draw. The first stage in the execution pipelines of all modern processors is a a decode stage. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | 6 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
[deleted] |