Remix.run Logo
vidarh 3 hours ago

I'm currently testing Claude Code for a project where it isn't coding. But the workflows built with it are now making me money after ~2 weeks, and I've previously done the same work manually, so I know the turnaround time: The turnaround for each deliverable is ~2 days with Claude and the fastest I've ever done it manually was 21 days. (Yes, I'm being intentionally vague - there isn't much of a moat for that project given how close Claude gets with very little prompting)

There are absolutely maintainability challenges. You can't just tell these tools to build X and expect to get away with not reviewing the output and/or telling it to revise it.

But if you loosen the reigns and review finished output rather than sit there and metaphorically look over its shoulder for every edit, the time it takes me to get it to revise its work until the quality is what I'd expect of myself is still a tiny fraction of what it'd take me to do things manually.

The time estimate above includes my manual time spent on reviews and fixes. I expect that time savings to increase, as about half of the time I spend on this project now is time spent improving guardrails and adding agents etc. to refine the work automatically before I even glance at the output.

The biggest lesson for me is that when people are not getting good results, most of the time it seems to me it is when people keep watching every step their agent takes, instead of putting in place a decent agent loop (create a plan for X; for each item on the plan: run tests until it works, review your code and fix any identified issues, repeat until the tests and review pass without any issues) and letting the agent work until it stops before you waste time reviewing the result.

Only when the agent repeatedly fails to do an assigned task adequately do I "slow it down" and have it do things step by step to figure out where it gets stuck / goes wrong. At which point I tell it to revise the agents accordingly, and then have it try again.

It's not cost effective to have expensive humans babysit cheap LLMs, yet a lot of people seem to want to babysit the LLMs.