▲ | anarazel 3 days ago | |
On the 48 core system, building linux peaks at about 48GB/s; LLVM peaks at something like 25GB/s. The system has well over 450GB/s of memory bandwidth. | ||
▲ | menaerus 2 days ago | parent [-] | |
> On the 48 core system, building linux peaks at about 48GB/s; LLVM peaks at something like 25GB/s LLVM peak is suspiciously low since building LLVM is heavier than the kernel? Anyway, on my machine, which is dual-socket 2x22-core skylake-x, for pure release build without debug symbols (less memory pressure), I get ~60GB/s.
For release build with debug symbols, which is much heavier, and what I normally use during the development, so my experience is probably more biased towards that workload, is >50% larger - ~98GB/s.
I repeated the experiment with linux kernel, and I get almost the same figure as you do - ~48GB/s.
Now this was peak accumulated but I was also interested in what is the single highest read/write bw measured. For LLVM/clang release with debug symbols this is what I get ~32GB/s for write bw and ~52GB/s for read bw.
This is btw very close to what my socket can handle, store bandwidth is ~40GB/s, load bandwidth is ~80GB/s, and combined load-store bandwidth is 65G/s.So, I think it is not unreasonable to say that there are compiler workloads that can be limited by the memory bandwidth. I for sure worked with heavier codebases even than LLVM, and even though I did not do the measurements back then, the gut feeling I was having is that the bw is consumed. Some translation units would literally stay for few minutes "compiling" but no progress would have been made. I agree that random access memory patterns and the latency those patterns incur are also a cost that need to be added to this cost function. My initial comment on this topic was - I don't really believe that the bottleneck in compilation for larger codebases, of course not on _any_ given machine, is on the compute side, and therefore I don't see how modules are going to fix any of this. |