In languages with strong compile-time checks (like say rust) the obvious problems can mostly be solved by having the agent try to compile the program as a last step, and most agents now do that on their own. In cases where that doesn't work (more permissive languages like python, or http APIs) you can have the AI write tests and execute them. Or ask the AI to prototype and test features separately before adding them to the codebase. Adding MCP servers with documentation also helps a ton.

The real issues I'm struggling with are more subtle, like unnecessary code duplication, code that seems useful but is never called, doing the right work but in the wrong place, security issues, performance issues, not implementing the prompt correctly when it's not straight forward, implementing the prompt verbatim when a closer inspection of the libraries and technologies used reveals a much better way, etc. Mostly things you will catch in code review if you really pay attention. But whether that's faster than doing the task yourself greatly depends on the task at hand

▲

lsaferite 5 days ago | parent | next [-]

The subtle bugs are horrible.

We used Claude Code the other day to add a new record type to an API and it was mostly right. CC decided (for some weird reason) to use a slightly different return shape on a list endpoint than the entire rest of the API. It changed two field names (count/items became total_count/data). This divergence was missed until the code was released because it 'worked' and had full tests and everything. But when the standard client lib code was used to access the API it failed on the list endpoint. Didn't take long to discover the issue. Luckily, it was a new feature so nothing broke, but it was a very clear reminder that you have to be very thorough when reviewing coding agent PRs.

FWIW, I use CC frequently and have mostly positive things to say about it as a tool.

	▲	5 days ago \| parent [-]
		[deleted]

▲

Disposal8433 6 days ago | parent | prev [-]

> the obvious problems can mostly be solved by having the agent try to compile the program

The famous "It compiles on my machine." Is that where engineering is going? Spending $billions to get the same result as the laziest developer ever?

	▲	wongarsu 6 days ago \| parent [-]
		If it compiles on my machine then the library and all called methods exist and are not hallucinated. If it runs on my machine then the called external APIs exist and are not hallucinated That obviously does not mean that it's good software. That's why the rest of my comment exists. But "AI is hallucinating libraries/APIs" is something that can be trivially solved with good software practices from the 00s, and that the AI can resolve by itself using those techniques. It's annoying for autocomplete AI, but for agents it's a non-issue