Remix.run Logo
menaerus 3 days ago

Chill out. Compiler is a heavily multithreaded program that is utilizing all of the cores in C and C++ compilation model. Since each thread is doing the work, it will obviously also consume memory, no? Computing 101. Total amount of data being touched R/W we call a dataset. A dataset in cases of larger codebases does not fit into the cache. When dataset does not fit into the cache then the data starts to live in main memory. Accessing the data in main memory consumes memory bandwidth of the system. Try running 64 threads and 64-core system touching the data in memory and you will see for yourself.

compiler-guy 3 days ago | parent [-]

Compilers are typically not multithreaded. llvm certainly isn’t, although its linker is. C++ builds are usually many single threaded compilation processes running in parallel.

menaerus 3 days ago | parent [-]

You're nitpicking, that's what I meant. Many processes in parallel or many threads in parallel, former will achieve better utilization of memory. Regardless, it doesn't invalidate what I said

thechao 3 days ago | parent | next [-]

I was going to reply directly to you; but the re-reply is fine. I don't think your conclusion is wrong, but your analysis is bogus AF. Compiler transforms are usually strongly superpolynomial (quadratic or cubic or some NP-hard demon); a Knuth fast pass is going to traverse the entire IR tree under observation. The thing is, the IR tree under observation is usually pretty small; while it won't fit in the localest cache, it's almost certainly not in main memory after the first sweep. Subsequent trees will be somewhere in the far reaches of memory... but there's an awful lot of work between fetching trees.

groos 3 days ago | parent | prev | next [-]

The two main parts of a typical C++ compiler are the front-end, which handles language syntax and semantic analysis, and the back-end, which handles code generation. C++ makes it difficult to implement the front-end as a multithreaded program because it has context‑sensitive syntax (as does C). The meaning of a construct can change depending on whether a name encountered during parsing refers to an existing declaration or not.

As a result, parsing and semantic analysis cannot be easily divided into independent parts to run in parallel, so they must be performed serially. A modern implementation will typically carry out semantic analysis in phases, for example binding names first, then analyzing types, and so on, before lowering the resulting representation to a form suitable for code generation.

Generally speaking, declarations that introduce names into non‑local scopes must be compiled serially. This also makes the symbol table a limiting factor for parallelism, since it must be accessed in a mutually exclusive manner. _Some_ constructs can be compiled in parallel, such as function bodies and function template instantiations, but given that build systems already implement per‑translation‑unit parallelism, the additional effort is often not worthwhile.

In contrast, a language like C# is designed with context‑free syntax. This allows a top‑level fast parse to break up the source file (there are no #include's in C#) into declarations that can, in principle, be processed in parallel. There will still be dependencies between declarations, and these will limit parallelism. But given that C# source files are a tiny fraction of the size of a typical C++ translation unit, even here parallel compilation is probably not a big win.

The C++ back-end can take advantage of multithreading far more than the front end. Once global optimizations are complete, the remaining work can be queued in parallel for code generation. MSVC works in exactly this way and provides options to control this parallelism. However, parallelism is limited by Amdahl’s Law, specifically the need to read in the IR generated by the front-end and to perform global optimizations.

bluGill 3 days ago | parent | prev | next [-]

It isn't memory utilization it is bandwidth. The CPU can only get so many bytes in and out from main memory and only has so much cache. Eventually the cores are fighting each other for access to the main memory they need. There is plenty of memory in the system, the CPU just can't get at enough of it.

NUMA (non-unifrom memory access - basically give each CPU a serpate bank of RAM, and if you need something that is in the other bank of RAM you need to ask the other CPU) exists because of this. I don't have access to a NUMA to see how they compare. My understanding (which could be wrong) is OS designers are still trying to figure out how to use them well, and they are not expected to do well for all problems.

skeezyboy 3 days ago | parent | prev [-]

Hes not nitpicking at all. Every workload is different. How do you even know the compiler is memory bound like you say it is? Youre espousing general wisdom that doesnt apply in specific cases