I believe the author explicitly suggests strategies to deal with this problem, which is the entire second half of the post. There’s a big difference between when you act as a human tester in the middle vs when you build out enough guardrails that it can do meaningful autonomous work with verification.

▲

WhyOhWhyQ 16 hours ago | parent | next [-]

I'm just extremely skeptical about that because I had many ideas like that and it still ended up being miserable. Maybe with Opus 4.5 things would go better though. I did choose an extremely ambitious project to be fair. If I were to try it again I would pick something more standard and a lot smaller.

I put like 400 hours into it by the way.

▲

stantonius 15 hours ago | parent [-]

This is so relatable it's painful: many many hours of work, overly ambitious project, now feeling discouraged (but hopefully not willing to give up). It's some small consolation to me to know others have found themselves in this boat.

Maybe we were just 6 months too early to start?

Best of luck finishing it up. You can do it.

	▲	WhyOhWhyQ 15 hours ago \| parent [-]
		Thank! Yes I won't give up. The plan now is to focus on getting an income and try again in the future.

▲

irrationalfab 15 hours ago | parent | prev [-]

+1... like with a large enough engineering team, this is ultimately a guardrails problem, which in my experience with agentic coding it’s very solvable, at least in certain domains.

	▲	majormajor 10 hours ago \| parent [-]
		Like with large engineering teams I have little faith people will suddenly get the discipline to do the tedious, annoying, difficult work of building good enough guardrails now. We don't even build guardrails that keep humans who test stuff as they go from introducing subtle bugs by accident; removing more eyes from that introduces new risks (although LLMs are also better at avoiding certain types of bugs, like copypasta shit). "Test your tests" gets very difficult as a product evolves and increases in complexity. Few contracts (whether unit test level or "automation clicking on the element on the page") level are static enough to avoid needing to rework the tests, which means reworking the testing of the tests, ... I think we'll find out just how low the general public's tolerance for bugs and regressions is.