Performance optimization is not always just low-hanging fruit. When you start trying to optimize something, there's often large bottlenecks to clean up -- "we can add a hashmap here to turn this O(n^2) function into O(n) and improve performance by 10x" kind of thing. But once you've got all the easy stuff, what's left is the little things -- a 1% improvement here, a 0.1% improvement there. If you can find 10 of those 1% improvements, now you've got a 10% performance improvement. 0.2 seconds on its own isn't that much, but the reason mold is so fast is because the author has found a lot of 0.2 second improvements.

And even disregarding that, the linked LKML post mentions LLVM/clang at a case of building a 4GB executable. If you've ever built the LLVM project from source, there's about 50ish (?) binaries that need to be linked at the end of the build process -- it's not just clang, but all sorts of other tools, intermediate libraries and debugging utilities. So that is an example of a workload with "multi-GB executables being built at such a pace" -- saving 0.2 seconds per executable saves something like 10 seconds on the build.

▲

magicalhippo 7 months ago | parent [-]

I'm well aware of the joys of optimization, I just haven't come across someone building multi-GB executables at a pace where milliseconds spent linking mattered.

To me that's an exotic workload which sounds interesting, hence why I'm curious.

▲

NobodyNada 7 months ago | parent [-]

Well, keep in mind that the full linking step has to be done at the end of an incremental build. So if you're a developer actively working on a project with a 4GB executable, that linking time is part of your edit-compile-test cycle, and you have to wait for it every time you change a line of code.

The benchmarks on mold's README show that GNU gold takes 33 seconds to link clang, whereas mold takes 1.3 seconds. If you're a developer working on Clang, that's a pretty serious productivity improvement.

	▲	kardos 7 months ago \| parent [-]
		> Well, keep in mind that the full linking step has to be done at the end of an incremental build. Does it have to be that way? For an incremental build, particularly in the "not much has changed except some function contents" case, is it plausible to reuse the previous linker output and update the incrementally updated part?