| ▲ | kburman 4 days ago | ||||||||||||||||||||||
It’s an interesting direction, but feels pretty expensive for what might still be a guess at what matters. I’m not sure an LLM can really capture project-specific context yet from a single PR diff. Honestly, a simple data-driven heatmap showing which parts of the code change most often or correlate with past bugs would probably give reviewers more trustworthy signals.  | |||||||||||||||||||||||
| ▲ | CuriouslyC 4 days ago | parent | next [-] | ||||||||||||||||||||||
This is not that expensive with Gemini, they give free keys that have plenty of req/day, you can upload your diff + a bundle of the relevant part of the codebase and get this behavior for free, at least for a small team with ~10-20 PR's / day. If you could run this with personal keys, anyhow.  | |||||||||||||||||||||||
  | |||||||||||||||||||||||
| ▲ | ivanjermakov 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||
Premise is amazing. Wonder if there are tools that do something similar by looking at diff entropy.  | |||||||||||||||||||||||
| ▲ | lawrencechen 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||
Yeah this is honestly pretty expensive to run today. > I’m not sure an LLM can really capture project-specific context yet from a single PR diff. We had an even more expensive approach that cloned the repo into a VM and prompted codex to explore the codebase and run code before returning the heatmap data structure. Decided against it for now due to latency and cost, but I think we'll revisit it to help the LLM get project context. Distillation should help a bit with cost, but I haven't experimented enough to have a definitive answer. Excited to play around with it though! > which parts of the code change most often or correlate with past bugs I can think of a way to do the correlation that would require LLMs. Maybe I'm missing a simpler approach? But agree that conditioning on past bugs would be great  | |||||||||||||||||||||||
  | |||||||||||||||||||||||
| ▲ | cerved 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||
> Honestly, a simple data-driven heatmap showing which parts of the code change most often or correlate with past bugs would probably give reviewers more trustworthy signals. At first I thought this to but now I doubt that's a good heuristic. That's probably where people would be careful and/or look anyway. If I were to guess, regressions are less likely to occur in "hotspots". But this is just a hunch. There are tons of well reviewed and bug reported open source projects, would be interesting if someone tested it.  | |||||||||||||||||||||||
| ▲ | nonethewiser 4 days ago | parent | prev [-] | ||||||||||||||||||||||
A large portion of the lines of code I'm considering when I review a PR are not part of the diff. This has to be a common experience - think of how often you want to comment on a line of code or file that just isn't in the PR. It happens almost every PR for me. They materialize as lose comments, or comments on a line like "Not this line per-se but what about XYZ?" Or "you replaced this 3 places but I actually found 2 more it should be applied to." I mean these tools are fine. But let's be on the same page that they can only address a sub-class of problems.  | |||||||||||||||||||||||