Remix.run Logo
captainkrtek 3 hours ago

One challenge with code review as an antidote to poor quality gen-AI code, is that we largely see only the code itself, not the process or inputs.

In the pre-gen-AI days, if an engineer put up a PR, it implied (somewhat) they wrote their code, reviewed it implicitly as they wrote it, and made choices (ie: why is this the best approach).

If Claude is just the new high level programming language, in terms of prompting in natural language, the challenge is that we're not reviewing the natural language, we're reviewing the machine code without knowing what the inputs were. I'm not sure of a solution to this, but something along the lines of knowing the history of the prompting that ultimately led to the PR, the time/tokens involved, etc. may inform the "quality" or "effort" spent in producing the PR. A one-shotted feature vs. a multi-iteration feature may produce the same lines of code and general shape, but one is likely to be higher "quality" in terms of minimal defects.

Along the same lines, when I review some gen-AI produced PR, it feels like I'm reading assembly and having to reverse how we got here. It may be code that runs and is perfectly fine, but I can't tell what the higher level inputs were that produced it, and if they were sufficient.