Remix.run Logo
john01dav 8 hours ago

> Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it. Moreover, once in the loop of peer review, generation should more or less be removed: if code review comments are addressed by wholesale re-generation, iterative review becomes impossible.

My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:

1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project

2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.

3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.

4) I then tell it to generate the code

5) I skim & test the code to see if it's generally correct, and have it make corrections as needed

6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)

The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.

This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.

ryandrake 6 hours ago | parent | next [-]

I've found that your step 6 takes the vast majority of the time I spend programming with LLMs. Like 10X+ the combined total of time steps 1-5 take. And that's if the code the LLM produced actually works. If it doesn't work (which happens quite often), then even more handholding and corrections are needed. It's really a grind. I'm still not sure whether I am net saving time using these tools.

I always wonder about the people who say LLMs save them so much time: Do you just accept the edits they make without reviewing each and every line?

hedgehog 5 hours ago | parent | next [-]

You can have the tool start by writing an implementation plan describing the overall approach and key details including references, snippets of code, task list, etc. That is much faster than a raw diff to review and refine to make sure it matches your intent. Once that's acceptable the changes are quick, and having the machine do a few rounds of refinement to make sure the diff vs HEAD matches the plan helps iron out some of the easy issues before human eyes show up. The final review is then easier because you are only checking for smaller issues and consistency with the plan that you already signed off on.

It's not magic though, this still takes some time to do.

Jaygles 5 hours ago | parent | prev | next [-]

I exclusively use the autocomplete in cursor. I hate reviewing huge chunks of llm code at one time. With the autocomplete, I’m in full control of the larger design and am able to quickly review each piece of llm code. Very often it generates what I was going to type myself.

Anything that involves math or complicated conditions I take extra time on.

I feel I’m getting code written 2 to 3 times faster this way while maintaining high quality and confidence

zeroonetwothree 4 hours ago | parent [-]

Maybe it subjectively feels like 2-3x faster but in studies that measure it we tend to see smaller improvements like in the range of 20-30% faster. It could be that you are an outlier, of course.

mythrwy 4 hours ago | parent | prev [-]

If it's stuff I have have been doing for years and isn't terribly complex I've found its generally quick to skim review. I don't need to read every line I can glance at it, know it's a loop and why, a function call or whatever. If I see something unusual I take that as an opportunity to learn.

I've seen LLMs write some really bad code a few times lately it seems almost worse than what they were doing 6 or 8 months ago. Could be my imagination but it seems that way.

ec109685 6 hours ago | parent | prev [-]

Don’t make manual corrections.

If you keep all edits to be driven by the LLM, you can use that knowledge later in the session or ask your model to commit the guidelines to long term memory.

klauserc an hour ago | parent [-]

The best way to get an LLM to follow style is to make sure that this style is evident in the codebase. Excessive instructions (whether through memories or AGENT.md) do not help as much.

Personally, I absolutely hate instructing agents to make corrections. It's like pushing a wet noodle. If there is lots to correct, fix one or two cases manually and tell the LLM to follow that pattern.

https://www.humanlayer.dev/blog/writing-a-good-claude-md