Remix.run Logo
torginus a day ago

The problem is that due to how templates work, each compilation unit will end up with its own copy of templated function, which creates extra work, code bloat etc.

The compiler also doesn't really inline or optimize functions as well across object boundaries without link-time optimization.

But the linker is single threaded and notoriously slow - with LTO, I wouldnt be surprised it would take up as much time as the whole unity build, and the results are often suboptimal.

Also, C++ syntax is notoriously hard and slow to parse, the clang frontend takes almost as much time to run as LLVM itself.

So probably modules would help a lot with parallel parsing, but that would help unity builds just as much.

jjmarr 18 hours ago | parent [-]

> each compilation unit will end up with its own copy of templated function, which creates extra work, code bloat etc.

Yes, that's what causes the parsing bottleneck. Unity builds don't need to create multiple copies of templated functions.

C++20 modules could fix that because the function is parsed before substitution. Tbd on if that optimization works yet, I tried it on Clang 18 and it didn't.

> But the linker is single threaded and notoriously slow

I think most linkers have parallel LTO and `mold` provides actual parallel linking.