▲ | menaerus 3 days ago | |
I doubt you can saturate the bandwidth with dual-socket configuration with each having 10 cores. Perhaps if you have very recent cores, which I believe you don't, but Intel design hasn't been that good. What you're also measuring in your experiment, and needs to be taken into account, is the latency across the NUMA nodes which is ridiculously high, 1.5x to 2x to the local node, amounting to usually ~130ns. Because of this, in NUMA configurations, you usually need more (Intel) cores to saturate the bw. I know because I have one sitting at my desk. Memory bandwidth saturation usually begins at ~20 cores with the Intel design that is roughly ~5 year old. I might be off with that number but it's roughly something like that. Other cores if you have them burning the cycles are just sitting there and waiting in the line for the bus to become free. |