▲ | jcranmer 7 days ago | |
So I decided to look at some open source repos I know decently well. The only one that seems to have a wiki is LLVM (https://deepwiki.com/llvm/llvm-project). Thoughts on the overview page: Okay, weird subset of the top-level directories. The high-level compilation pipeline diagram is... wrong? Like, Clang-AST is definitely part of clang frontend, and you get to the optimization pipeline, which clearly fucks up the flow through vectorization and instruction selection (completely omitting GlobalISel as well too, for that matter). The choice of backends to highlight is weird, and at the end of the day, it manages to completely omit some of the most important passes in LLVM (like InstCombine). Drilling down into the other pages... I mean at no point does it even discuss or give an idea of what LLVM IR. There's nothing about the pass manager, nothing about expected canonicalization of passes. It's got a weird fixation about some things (like the role of TargetLowering), but manages to elide pretty much any detail that is actually useful. The role of TableGen in several components is completely missing--and FWIW, understanding TableGen and its error messages is probably the single hardest part of putting together an LLVM backend, precisely the thing you'd want it to focus on. If I had to guess, it's overly fixated on things that happen to be very large files--I think everything it decided to focus on in a single page happens to be a 30kloc file or something. But that means it also misses the things that are so gargantuan they're split into multiple files--Clang codegen is ~100kloc and InstCombine is ~40kloc but since they're in several 4-5kloc files instead of a large 26kloc file (SLPVectorizer) or 62kloc file (X86ISelLowering), they're simply not considered important and ignored. | ||
▲ | IceHegel 7 days ago | parent | next [-] | |
Yeah this is my experience too. For projects I know well, the diagrams are not engineering quality. | ||
▲ | grokblah 7 days ago | parent | prev | next [-] | |
That’s a very intriguing observation. (I haven’t read how it works but…) I wonder if removing file sizes, commit counts, and other numerical metadata would have a significant impact on the output. Or if all of the files were glommed into one large input with path+filename markers? | ||
▲ | menaerus 5 days ago | parent | prev [-] | |
Nonetheless I still think it's impressive considering that LLVM codebase is one of the most complex ones to be found in the wild. |