| ▲ | lpribis 8 hours ago | |
I was curious what information I could glean from these for some popular repos. Caveat: I'm primarily an low-level embedded developer so I don't interface with large open source projects at the source level very often (other than occasionally the linux kernel). I chose some projects at random that I use. *Mainline linux* Most changed files: pretty much what I expected for 1 and 2... the "cutting edge" of Linux development over other OSes -- bpf and containers. The bpf verifier and AMD GPU driver might get a boost in this list due to sheer LoCs in those files (26K and 14K respectively). An intel equivalent of amdgpu_dm is #21 in the list (drivers/gpu/drm/i915/display/intel_display.c) and nvidia is nowhere to be seen (presumably due to out-of-tree modules/blobs?).
Bus factor: obviously none. The top 4
Buggy files: Intel comes out on top of GPU drivers this time (twice). Along with KVM for x86(64), the main allocator, and BTRFS.
*GCC*Most changed files: IR autovectorization code, riscv heuristics tables, and C++ template handling (pt.c is "paramaterized types").
Buggy files: DWARF debuginfo generation, x86 heuristics tables, RS6000(?!) heuristic tables. I had to look up RS6000, it's an IBM instruction set from the 90s lol. cp-tree.h is an interesting file, it seems be the main C(++) AST datastructures.
*xfwm4*
Most changed files: the list is dominated by *.po localizations. I filtered these out. Even after this, I discovered there is very little active development in the last few years. If I extend to 4 years ago, I get:
1. src/client.c - Realizing this project is too "small" to glean much from this. client.c is just the core X client management code. Makes sense.
2. src/placement.c - Other core window management code.This has not told me much other than where most of the functionality of this project lies. Bus factor: Pretty huge. Not really an issue in this case due to lack of development I guess.
Files with bug commits: Very similar distribution to most changed files. Not enough datapoints in this one to draw any big conclusions.I think these massive open projects (excl xfwm) are generally pretty consistent code quality across the heavily trodden areas because of the amount of manpower available to refactor the pain points. I've yet to see an example of "god help you if you have to change that file" in e.g. linux, but I have of course seen that situation many times in large proprietary codebases. | ||
| ▲ | grepsedawk 7 hours ago | parent | next [-] | |
Big projects tend to self-correct. These commands hit differently on private codebases with 3-10 contributors, where high-churn usually means one person patching the same thing repeatedly. | ||
| ▲ | croemer 4 hours ago | parent | prev [-] | |
This is better than the post itself. Showing real output from a real repo. | ||