Remix.run Logo
bumholes 15 hours ago

The code that does this is here, if anyone is curious:

https://github.com/llvm/llvm-project/blob/release/21.x/llvm/...

https://github.com/llvm/llvm-project/blob/release/21.x/llvm/...

vodou 15 hours ago | parent | next [-]

Almost 16000 lines in a single source code file. I find this both admirable and unsettling.

loeg 14 hours ago | parent | next [-]

Does it really matter where the lines are? 16,000 lines is still 16,000 lines.

vodou 13 hours ago | parent | next [-]

Even though I do find your indifference refreshing I must say: it does matter for quite a few people.

neerajsi 10 hours ago | parent | next [-]

If you want recognize all the common patterns, the code can get very verbose. But it's all still just one analysis or transformation, so it would be artificial to split into multiple files. I haven't worked much in llvm, but I'd guess that the external interface to these packages is pretty reasonable and hides a large amount of the complexity that took 16kloc to implement

MobiusHorizons 12 hours ago | parent | prev [-]

If you don’t rely on IDE features or completion plugins in an editor like vim, it can be easier to navigate tightly coupled complexity if it is all in one file. You can’t really scan it or jump to the right spot as easily as smaller files, but in vim searching for the exact symbol under the cursor is a single character shortcut, and that only works if the symbol is in the current buffer. This type of development works best for academic style code with a small number (usually one or two) experts that are familiar with the implementation, but in that context it’s remarkably effective. Not great for merge conflicts in frequently updated code though.

jiggawatts 9 hours ago | parent | prev | next [-]

... yes.

If it was 16K lines of modular "compositional" code, or a DSL that compiles in some provably-correct way, that would make me confident. A single file with 16K lines of -- let's be honest -- unsafe procedural spaghetti makes me much less confident.

Compiler code tends to work "surprisingly well" because it's beaten to death by millions of developers throwing random stuff at it, so bugs tend to be ironed out relatively quickly, unless you go off the beaten path... then it rapidly turns out to be a mess of spiky brambles.

The Rust development team for example found a series of LLVM optimiser bugs related to (no)aliasing, because C/C++ didn't use that attribute much, but Rust can aggressively utilise it.

I would be much more impressed by 16K lines of provably correct transformations with associated Lean proofs (or something), and/or something based on EGG: https://egraphs-good.github.io/

mananaysiempre 8 hours ago | parent [-]

On the other end of the optimizer size spectrum, a surprising place to find a DSL is LuaJIT’s “FOLD” stage: https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_opt_fold.c (it’s just pattern matching, more or less, that the DSL compiler distills down to a perfect hash).

afiori 12 hours ago | parent | prev [-]

Part of the issue is that it suggests that the code had a spaghettified growth; it is neither sufficient nor necessary but lacking external constraints (like an entire library developed as a single c header) it suggests that code organisation is not great.

anon291 11 hours ago | parent [-]

Hardware is often spaghetti anyway. There are a large number of considerations and conditions that can invalidate the ability to use certain ops, which would change the compilation strategy.

The idea of good abstractions and such falls apart the moment the target environment itself is not a good abstraction.

j-o-m 9 hours ago | parent | prev | next [-]

I find the real question: are all 16,000 of those lines require to implement the optimization? How much of that is dealing with LLVM’s internal representation and the varying complexity of LLVM’s other internal structure?

zahlman 14 hours ago | parent | prev [-]

I do too, but I'm pretty sure I've seen worse.

bitwizeshift 11 hours ago | parent | prev [-]

Thank you, bumholes