| ▲ | SaberTail 4 hours ago | |
I was on a greenfield project late last year with a team that was very enthusiastic about coding agents. I would personally call it a failure, and the project is quietly being wound down after only a few months. It went in a few stages: At first, it proceeded very quickly. Using agents, the team were able to generate a lot of code very fast, and so they were checking off requirements at an amazing pace. PRs were rubber stamped, and I found myself arguing with copy/pasted answers from an agent most of the time I tried to offer feedback. As the components started to get more integrated, things started breaking. At first these were obvious things with easy fixes, like some code calling other code with wrong arguments, and the coding agents could handle those. But a lot of the code was written in the overly-defensive style agents were fond of, so there were a lot more subtle errors. Things like the agent adding code to substitute an invalid default value in instead of erroring out, far away from where that value was causing other errors. At this point, the agents started making things strictly worse because they couldn't fit that much code in their context. Instead of actually fixing bugs, they'd catch any exceptions and substitute in more defaults. There was some manual work by some engineers to remove a lot of the defensive code, but they could not keep up with the agents. This is also about when the team discovered that most of the tests were effectively "assert true" because they mocked out so much. We did ship the project, but it shipped in an incredibly buggy state, and also the performance was terrible. And, as I said, it's now being wound down. That's probably the right thing to do because it would be easier to restart from scratch than try to make sense of the mess we ended up with. Agents were used to write the documentation, and very little of it is comprehensible. We did screw some things up. People were so enthusiastic about agents, and they produced so much code so fast, that code reviews were essentially non-existent. Instead of taking action on feedback in the reviews, a lot of the time there was some LLM-generated "won't do" response that sounded plausible enough that it could convince managers that the reviewers were slowing things down. We also didn't explicitly figure out things like how error-handling or logging should work ahead of time, and so what the agents did was all over the place depending on what was in their context. Maybe the whole mess was a necessary learning as we figure out these new ways of working. Personally I'm still using the coding agents, but very selectively to "fill-in-the-blanks" on code where I know what it should look like, but don't need to write it all by hand myself. | ||