Remix.run Logo
cadamsdotcom 3 days ago

Reviewer burden is going to worsen. Luckily, LLMs are good at writing code that checks code quality.

This is not running a prompt as that’s probabilistic so doesn’t guarantee anything! This is having an agent create a self-contained check that becomes part of the codebase and runs in milliseconds. It could do anything - walk the AST of the code looking for one anti-pattern, check code conventions.. a linter on steroids.

Building and refining a library of such checks relieves maintainers’ burden and lets submitters check their own code.

I’m not just saying it - its worked super well for me. I am always adding checks to my codebase. They enforce architecture “routes are banned from directly importing the DB they must go via the service layer” or “no new dependencies”, they inspect frontend code to find all the fetch calls & href’s then flag dead API routes and unlinked pages. With informative error messages, agents can tell they’ve half finished/half assed an implementation. My favorite prompt is “keep going til the checks pass”.

What kernel reviewers do is complex - but I wonder how much can be turned into lore in this way. Refined over time to make kernel development even more foolproof as it becomes more complex.

indiosmo 3 days ago | parent [-]

This resonates with my experience of using LLMs to build tooling.

I have a repo with several libraries where i need error codes to be globally unique, as well as adhere to a set of prefixes attributed to each library. This was enforced by carefully reviewing any commits that touched the error code headers.

I’ve had a ticket open for years to write a tool to do this and the general idea of the tool’s architecture but never got around to implementing it.

I used the LLMs to research design alternatives (clang tools, tree sitter, etc) and eventually implement a tree sitter based python tool that: given a json config of the library prefixes, checks they all adhere and that there are no duplicate error codes within a library.

This would probably have taken me at least a few days to do on my own (or probably would just sit in the backlog forever), took about 3 hours.

cadamsdotcom 3 days ago | parent [-]

The ROI on those 3 hours is immense. Runs in milliseconds. No capitalized instructions in AGENTS.md begging models to behave. And you can refine it anytime to cover more cases!