Remix.run Logo
tptacek 3 days ago

This article makes points that are valid in general but not apposite to code generation with agents.

It is indeed difficult to verify a piece of code that is either going to ship as-is, or with the specific modifications your verification identifies. In cryptography engineering the rule of thumb (never followed in practice) is 10x verification cost to x implementation cost. Verification is hard and expensive.

But qualifying agent-generated code isn't the verification problem, in the same way that validating an eBPF program in the kernel isn't solving the halting problem.

That's because the agentic scenario gives us an additional outcome: we can allow the code as-is, we can make modifications to the code, or we can throw out the code and re-prompt --- discarding (many) probably-valid programs in a search for the subset of programs that are easy to validate.

In practice, most of what people generate is boring and easy to validate: you know within a couple minutes whether it's the right shape, whether anything sticks out the wrong way, the way a veteran chess player can quickly pattern-match a whole chessboard. When it isn't boring, you read carefully (and expensively), or you just say "no, try again, give me something more boring", or you break your LLM generation into smaller steps that are easier to pattern match and recurse.

What professionals generally don't (I think) do with LLMs is generate large gnarly PRs, all at once, and then do a close-reading of those gnarly PRs. They read PRs that are easy (not gnarly). They reject the gnarly ones, and compensate for gnarliness by approaching the problem at a smaller level of granularity. Or, you know, just write those bits by hand!

kiitos an hour ago | parent [-]

> In practice, most of what people generate is boring and easy to validate: you know within a couple minutes whether it's the right shape, whether anything sticks out the wrong way, the way a veteran chess player can quickly pattern-match a whole chessboard. When it isn't boring, you read carefully (and expensively), or you just say "no, try again, give me something more boring", or you break your LLM generation into smaller steps that are easier to pattern match and recurse.

nope!

https://sketch.dev/blog/our-first-outage-from-llm-written-co...