| ▲ | Aurornis 3 hours ago | |||||||||||||
> The big question then is, why are ARM desktop (and server?) cores so far behind on wider SIMD support? Very wide SIMD instructions require a lot of die space and a lot of power. The AVX-512 implementation in Intel's Knight's Landing took up 40% of the die area (Source https://chipsandcheese.com/p/knights-landing-atom-with-avx-5... which is an excellent site for architectural analysis) Most ARM desktop/mobile parts are designed to be low power and low cost. Spending valuable die space on large logic blocks for instructions that are rarely used isn't a good tradeoff for consumer apps. Most ARM server parts are designed to have very high core counts, which requires small individual die sizes. Adding very wide SIMD support would grow die space of individual cores a lot and reduce the number that could go into a single package. Supporting 256-bit or 512-bit instructions would be hard to do without interfering with the other design goals for those parts. Even Intel has started dropping support for the wider AVX instructions in their smaller efficiency cores as a tradeoff to fit more of them into the same chip. For many workloads this is actually a good tradeoff. As this article mentions, many common use cases of high throughput SIMD code are just moving to GPUs anyway. | ||||||||||||||
| ▲ | aseipp 2 hours ago | parent | next [-] | |||||||||||||
Knights Landing is a major outlier; the cores there were extremely small and had very few resources dedicated to them (e.g. 2-wide decode) relative to the vector units, so of course that will dominate. You aren't going to see 40% of the die dedicated to vector register files on anything looking like a modern, wide core. The entire vector unit (with SRAM) will be in the ballpark of like, cumulative L1/L2; a 512-bit register is only a single 64 byte cache line, after all. | ||||||||||||||
| ||||||||||||||
| ▲ | kbolino 2 hours ago | parent | prev | next [-] | |||||||||||||
The rarity of use is a chicken-egg problem, though. The hardware makers consider it a waste because the software doesn't use it, and the software makers won't use it because it's not widely supported enough. Apple and Qualcomm not supporting it at all on any of their hardware tiers just exacerbates it. I think this is a good explanation for why mobile devices lack it, and even why say a MacBook Air or Mac Mini lacks it, but not why a MacBook Pro or Mac Studio lacks it. It does seem like server hardware is adopting SVE at least, even if it's not always paired with wider registers. There are lots of non-math-focused instructions in there that benefit many kinds of software that isn't transferable to a GPU. | ||||||||||||||
| ▲ | wtallis 2 hours ago | parent | prev | next [-] | |||||||||||||
> The AVX-512 implementation in Intel's Knight's Landing took up 40% of the die area That chip family was pretty much designed to provide just enough CPU power to keep the vector engines fed. So that 40% is an upper bound, what you get when you try to build a GPU out of somewhat-specialized CPU cores (which was literally the goal of the first generation of that lineage). For a general purpose chip, there's no reason to spend that large a fraction of the area on the vector units. Something like the typical ARM server chips with lots of weak cores definitely doesn't need each core to have a vector unit capable of doing 512-bit operations in a single cycle, and probably would be better off sharing vector units between multiple cores. For chips with large, high-performance CPU cores (eg. x86), a 512-bit vector unit will still noticeably increase the size of a CPU core, but won't actually dwarf the rest of the core the way it did for Xeon Phi. | ||||||||||||||
| ▲ | 2 hours ago | parent | prev | next [-] | |||||||||||||
| [deleted] | ||||||||||||||
| ▲ | formerly_proven 2 hours ago | parent | prev | next [-] | |||||||||||||
KNL is an almost 15 years old uarch expressly designed to compete with dedicated SIMD processors (GPGPU), dedicating the die to vector is the point of that chip. | ||||||||||||||
| ▲ | happyPersonR 3 hours ago | parent | prev [-] | |||||||||||||
Yeah this seems likely, but with all the LLM stuff it might be an outdated assumption. Buy new chips next year! Haha :) | ||||||||||||||