Remix.run Logo
majke 2 hours ago

This has puzzled me for a while. The cited system has 2x89.6 GB/s bandwidth. But a single CCD can do at most 64GB/s of sequential reads. Are claims like "Apple Silicon having 400GB/s" meaningless? I understand a typical single logical CPU can't do more than 50-70GB/s, and it seems like a group of CPU's typically shares a mem controller which is similarly limited.

To rephrase: is it possible to cause 100% mem bandwith utilization with only or 1 or 2 CPU's doing the work per CCD?

ryao 34 minutes ago | parent | next [-]

On Zen 3, I am able to use nearly the full 51.2GB/sec from a single CPU core. I have not tried using two as I got so close to 51.2GB/sec that I had assumed that going higher was not possible. Off the top of my head, I got 49-50GB/sec, but I last measured a couple years ago.

By the way, if the cores were able to load things at full speed, they would be able to use 640GB/sec each. That is 2 AVX-512 loads per cycle at 5GHz. Of course, they never are able to do this due to memory bottlenecks. Maybe Intel’s Xeon Max series with HBM can, but I would not be surprised to see an unadvertised internal bottleneck there too. That said, it is so expensive and rare that few people will ever run code on one.

KeplerBoy 2 hours ago | parent | prev [-]

Aren't those 400 GB/s a figure which only apply when the GPU with its much wider interface is accessing the memory?