This is a common workflow that most advanced users are familiar with.

Yet even following it to a T, and being really careful with how you manage context, the LLM will still hallucinate, generate non-working code, steer you into wrong directions and dead ends, and just waste your time in most scenarios. There's no magical workflow or workaround for avoiding this. These issues are inherent to the technology, and have been since its inception. The tools have certainly gotten more capable, and the ecosystem has matured greatly in the last couple of years, but these issues remain unsolved. The idea that people who experience them are not using the tools correctly is insulting.

I'm not saying that the current generation of this tech isn't useful. I've found it very useful for the same scenarios GP mentioned. But the above issues prevent me from relying on it for anything more sophisticated than that.

▲

brulard 3 days ago | parent [-]

> These issues are inherent to the technology

That's simply false. Even if LLMs don't produce correct and valid code on first shot 100% times of the cases, if you use an agent, it's simply a matter of iterations. I have claude code connected to Playwright, context7 for docs and to Playwright, so it can iterate by itself if there are syntax errors, runtime errors or problems with the data on the backend side. Currently I have near zero cases when it does not produce valid working code. If it is incorrect in some aspect, it is then not that hard to steer it to better solution or to fix yourself.

And even if it failed in implementing most of these stages of the plan, it's not all wasted time. I brainstormed ideas, formed the requirements, specifications to features and have clear documentation and plan of the implementation, unit tests, etc. and I can use it to code it myself. So even in the worst case scenario my development workflow is improved.

▲

mathiaspoint 3 days ago | parent | next [-]

It definitely isn't. LLMs often end up stuck in weird corners they just don't get and need someone familiar with the theory of what they're working on to unstick them. If the agent is the same model as the code generator it won't be able to on its own.

	▲	brulard 3 days ago \| parent \| next [-]
		I was getting to stuck state with Gemini and to lesser extent with Sonnet 4, but my cases were resolved by Opus. I think it is mostly due to size of the task and if you split it in advance to smaller chunks, all these models has much higher probability to resolve.
	▲	sawjet 3 days ago \| parent \| prev [-]
		Skill issue

▲

nojs 3 days ago | parent | prev [-]

Could you explain your exact playwright setup in more detail? I’ve found that claude really struggles to end-to-end test complex features that require browser use. It gets stuck for several minutes trying to find the right button to click for example.

	▲	brulard 3 days ago \| parent [-]
		No special setup, just something along "test with playwright" in the process list. It can get stuck, but for me it was not often enough for me to care. If it happens, I push it in the right direction.