| ▲ | pron 2 days ago | |||||||||||||||||||||||||
Agents just can't currently do that well. When you run into a problem when evolving the code to add a new feature or fix a bug, you need to decide whether the change belongs in the architecture or should be done locally. Agents are about as good as a random choice in picking the right answer, and there's typically only one right answer. They simply don't have the judgment. Sometimes you get the wrong choice in one session and the right choice in another. But this happens at all levels because there are many more than just two abstraction levels. E.g. do I change a subroutine's signature or do I change the callsite? Agents get it wrong. A lot. Another thing they just don't get (because they're so focused on task success) is that it's very often better to let things go wrong in a way that could inform changes rather than get things to "work" in a way that hides the problem. One of the reasons agent code needs to be reviewed even more carefully than human code is that they're really good at hiding issues with potentially catastrophic consequences. | ||||||||||||||||||||||||||
| ▲ | zozbot234 2 days ago | parent [-] | |||||||||||||||||||||||||
> Agents are about as good as a random choice in picking the right answer, and there's typically only one right answer. That's realistically because they aren't even trying to answer that question by thinking sensibly about the code. Working in a limited context with anything they do leaves them guessing and trying the first thing that might work. That's why they generally do a bit better when you explicitly ask them to reverse engineer/document a design of some existing codebase: that's a problem that at least involves an explicit requirement to comprehensively survey the code, figure out what part matters, etc. They can't be expected to do that as a default. It's not even a limitation of existing models, it's quite inherent to how they're architected. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||