▲ | tmpz22 a day ago | ||||||||||||||||
And IMO it has a long way to go. There is a lot of nuance when orchestrating dependencies that can cause subtle errors in an application that are not easily remedied. For example a lot of llms (I've seen it in Gemini 2.5, and Claude 3.7) will code non-existent methods in dynamic languages. While these runtime errors are often auto-fixable, sometimes they aren't, and breaking out of an agentic workflow to deep dive the problem is quite frustrating - if mostly because agentic coding entices us into being so lazy. | |||||||||||||||||
▲ | mikepurvis a day ago | parent | next [-] | ||||||||||||||||
"... and breaking out of an agentic workflow to deep dive the problem is quite frustrating" Maybe that's the problem that needs solving then? The threshold doesn't have to be "bot capable of doing entire task end to end", like it could also be "bot does 90% of task, the worst and most boring part, human steps in at the end to help with the one bit that is more tricky". Or better yet, the bot is able to recognize its own limitations and proactively surface these instances, be like hey human I'm not sure what to do in this case; based on the docs I think it should be A or B, but I also feel like C should be possible yet I can't get any of them to work, what do you think? As humans, it's perfectly normal to put up a WIP PR and then solicit this type of feedback from our colleagues; why would a bot be any different? | |||||||||||||||||
| |||||||||||||||||
▲ | jasonthorsness a day ago | parent | prev | next [-] | ||||||||||||||||
The agents will definitely need a way to evaluate their work just as well as a human would - whether that's a full test suite, tests + directions on some manual verification as well, or whatever. If they can't use the same tools as a human would they'll never be able to improve things safely. | |||||||||||||||||
▲ | soperj a day ago | parent | prev [-] | ||||||||||||||||
> if mostly because agentic coding entices us into being so lazy. Any coding I've done with Claude has been to ask it to build specific methods, if you don't understand what's actually happening, then you're building something that's unmaintainable. I feel like it's reducing typing and syntax errors, sometime it leads me down a wrong path. | |||||||||||||||||
|