Remix.run Logo
wtallis 2 hours ago

> The AVX-512 implementation in Intel's Knight's Landing took up 40% of the die area

That chip family was pretty much designed to provide just enough CPU power to keep the vector engines fed. So that 40% is an upper bound, what you get when you try to build a GPU out of somewhat-specialized CPU cores (which was literally the goal of the first generation of that lineage).

For a general purpose chip, there's no reason to spend that large a fraction of the area on the vector units. Something like the typical ARM server chips with lots of weak cores definitely doesn't need each core to have a vector unit capable of doing 512-bit operations in a single cycle, and probably would be better off sharing vector units between multiple cores. For chips with large, high-performance CPU cores (eg. x86), a 512-bit vector unit will still noticeably increase the size of a CPU core, but won't actually dwarf the rest of the core the way it did for Xeon Phi.