| ▲ | zozbot234 2 days ago | ||||||||||||||||
> Agents are about as good as a random choice in picking the right answer, and there's typically only one right answer. That's realistically because they aren't even trying to answer that question by thinking sensibly about the code. Working in a limited context with anything they do leaves them guessing and trying the first thing that might work. That's why they generally do a bit better when you explicitly ask them to reverse engineer/document a design of some existing codebase: that's a problem that at least involves an explicit requirement to comprehensively survey the code, figure out what part matters, etc. They can't be expected to do that as a default. It's not even a limitation of existing models, it's quite inherent to how they're architected. | |||||||||||||||||
| ▲ | pron 2 days ago | parent [-] | ||||||||||||||||
Yes, and I think there's a fundamental problem here. The big reason the "AI thought leadership" claim that AI should do well at coding is because there are mechanical success metrics like tests. Except that's not true. The tests cover the behaviour, not the structure. It's like constructing a building where the only tests are whether floorplans match the design. It makes catastrophic strctural issues easy to hide. The building looks right, and it might even withstand some load, but later, when you want to make changes, you move a cupboard or a curtain rod only to have the structure collapse because that element ended up being load-bearing. It's funny, but one of the lessons I've learnt working with agents is just how much design matters in software and isn't just a matter of craftsmenship pride. When you see the codebase implode after the tenth new feature and realise it has to be scrapped because neither human nor AI can salvage it, the importance of design becomes palpable. Before agents it was hard to see because few people write code like that (just as no one would think to make a curtain rod load-bearing when building a structure). And let's not forget that the models hallucinate. Just now I was discussing architecture with Codex, and what it says sounds plausible, but it's wrong in subtle and important ways. | |||||||||||||||||
| |||||||||||||||||