| ▲ | Festro 2 days ago | |
I can't speak to coding as it's not my area but certainly the pattern I've spotted is that it's best at grunt work. That's where the time savings kick in. Browsing sites, linking up data, spotting anomalies, writing documentation, formatting documents, etc. If a task isn't repetitive or doesn't involve ingesting data, then I think the time savings shrink rapidly and the need for oversight increases massively. I think some people are managing to set up enough automated oversight to get round that, but it's adding a layer that multiplies your token usage to do so and still has no guarantee. But certainly all these layers being added are increasing success rates. Andrei Karpathy is speaking about barely coding now. He has a bias, a comment from him like that is marketing for Anthropic, but I believe he's found some groove with his setup to achieve that. I think the current status quo this month in 2026 we're at a point where the best tips and tricks to get usable answers out of ChatGPT a year ago have been consolidated into what we know call memory and skills in Claude and other agent harness type systems. You might need to explore those more, in fact I think for Claude Code/Cursor there are even more layers for checking outputs that I've not even seen in Claude Desktop. And I think your exact issue, and the experience of the vast volumes of people who share it with you, are an audience that the app makers want to better convince. The free tiers and marketing sites are going to step up their game gradually and there will be new features that lower failure rates even more. | ||