Remix.run Logo
zozbot234 6 hours ago

> The program wants to depend on some invariant, say that a particular list is always sorted, and the code maintains it by always inserting elements in the right place in the list.

Invariants must be documented as part of defining the data or program module, and ideally they should be restated at any place they're being relied upon. If you fail to do so, that's a major failure of modularity and it's completely foreseeable that you'll have trouble evolving that code.

pron 6 hours ago | parent [-]

Right, except even when the invariants are documented agents get into trouble. Virtually every week I see the agent write strange code with multiple paths. It knows that the invariant _should_ hold, but it still writes a workaround for cases it doesn't. Something I see even more frequently is where the agent knows a certain exception shouldn't occur, but it does, so half the time it will choose to investigate and half the time it says, oh well, and catches the exception. In fact, it's worse. Sometimes it catches exceptions that shouldn't occur proactively as part of its "success at all costs" drive, and all these contingency plans it builds into the code make it very hard (even for the agent) to figure out why things go wrong.

Most importantly, this isn't hypothetical. We see that agents write programs that after some number of changes just collapse because they don't converge. They don't transition well between layers of abstractions, so they build contingencies into multiple layers, and the result is that after some time the codebase is just broken beyond repair and no changes can be made without breaking something (and because of all the contingencies, reproducing the breakage can be hard). This is why agents don't succeed in building even something as simple as a workable C compiler even with a full spec and thousands of human-written tests.

If the agents could code well, no one would be complaining. People complain because agent code becomes structurally unsound over time, and then it's only a matter of time until it collapses. Every fix and change you make without super careful supervision has a high chance of weakening the structure.

zozbot234 6 hours ago | parent [-]

Agents don't really know the whole codebase when they're writing the code, their context is way too tiny for that; and trying to grow context numbers doesn't really work well (most of it gets ignored). So they're always working piece-meal and these failures are entirely expected unless the codebase is rigorously built for modularity and the agent is told to work "in the small" and keep to the existing constraints.

pron 4 hours ago | parent [-]

> Agents don't really know the whole codebase when they're writing the code

Neither do people, yet people manage to write software that they can evolve over a long time, and agents have yet to do that. I think it's because people can move back and forth between levels of abstraction, and they know when it's best to do it, but agents seem to have a really hard time doing that.

On the other hand, agents are very good at debugging complex bugs that span many parts of the codebase, and they manage to do that even with their context limitations. So I don't think it's about context. They're just not smart enough to write stable code yet.