"... and breaking out of an agentic workflow to deep dive the problem is quite frustrating"

Maybe that's the problem that needs solving then? The threshold doesn't have to be "bot capable of doing entire task end to end", like it could also be "bot does 90% of task, the worst and most boring part, human steps in at the end to help with the one bit that is more tricky".

Or better yet, the bot is able to recognize its own limitations and proactively surface these instances, be like hey human I'm not sure what to do in this case; based on the docs I think it should be A or B, but I also feel like C should be possible yet I can't get any of them to work, what do you think?

As humans, it's perfectly normal to put up a WIP PR and then solicit this type of feedback from our colleagues; why would a bot be any different?

▲

dvfjsdhgfv 2 months ago | parent [-]

> Maybe that's the problem that needs solving then? The threshold doesn't have to be "bot capable of doing entire task end to end", like it could also be "bot does 90% of task, the worst and most boring part, human steps in at the end to help with the one bit that is more tricky".

Still, the big short-term danger being you're left with code that seems to work well but has subtle bugs in it, and the long-term danger is that you're left with a codebase you're not familiar with.

	▲	mikepurvis 2 months ago \| parent [-]
		Being left with an unfamiliar codebase is always a concern and comes about through regular attrition, particularly if inadequate review is not in place or people are cycling in and out of the org too fast for proper knowledge transfer (so, cultural problems basically). If anything, I'd bet that agent-written code will get better review than average because the turn around time on fixes is fast and no one will sass you for nit-picking, so it's "worth it" to look closely and ensure it's done just the way you want.