> And then I started realizing: all the knowledge I have accumulated over the years: the trade-offs between implementations, how acquiring works, how to structure idempotency to prevent double-charges, everything, was becoming useless.

How is that true? I've been using Opus on an industry scale over last 6 months and this is just not real.

It has consistently with a certain percentage of chance each time (and no claude.md and skills do not stop it fully):

* Suggested to remove tests to allow for things to pass

* Suggested remove an error so that things can be "unblocked"

* Suggested to use a second path when the original path ran into problem instead of making the original path accomodate for that possibility.

* Suggested or silently added "features" or "guardrail" that I don't want.

* Can be left unsupervised only if given a goal that it can verify against itself. Without such clear goal (e.g. this test in the integration environment must be fixed), it flounders.

I'm not using just the native harness (e.g. CC) either, with additional, customized harness, the behavior improves somewhat but are still fundamentally constrained and cannot really be trusted without verification.

See my methodology (100% handwritten): https://aperocky.com/blog/post.html?slug=agentic-development....

Being a heavy user I think I've ran into every single hallucination that the model can do over development release and operations. I am still a heavy user but there are a lot of value in recognizing where exactly LLM's limit is and work around that.

▲

causal 3 hours ago | parent [-]

To me the greatest monument to Claude's poor software quality is Claude Code itself.

	▲	Aperocky 3 hours ago \| parent [-]
		Yes, let's build a 40K line main loop! I wonder if they thought claude code need to be more like an LLM to work lmao.