▲ | jeffbee 7 months ago | |||||||||||||||||||||||||||||||||||||||||||||||||
There are large differences in load/store performance across implementations. On Apple Silicon for example the M1 Max a single core can stream about 100GB/s all by itself. This is a significant advantage over competing designs that are built to hit that kind of memory bandwidth only with all-cores workloads. For example five generations of Intel Xeon processors, from Sandybridge through Skylake, were built to achieve about 20GB/s streams from a single core. That is one reason why the M1 was so exceptional at the time it was released. The 1T memory performance is much better than what you get from everyone else. As far as claims of the M1 Max having > 400GB/s of memory bandwidth, this isn't achievable from CPUs alone. You need all CPUs and GPUs running full tilt to hit that limit. In practice you can hit maybe 250GB/s from CPUs if you bring them all to bear, including the efficiency cores. This is still extremely good performance. | ||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | majke 7 months ago | parent [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
I don't think single M1 cpu can do 100GB/s. This source says 68GB/s peak: https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste... | ||||||||||||||||||||||||||||||||||||||||||||||||||
|