> These issues are inherent to the technology

That's simply false. Even if LLMs don't produce correct and valid code on first shot 100% times of the cases, if you use an agent, it's simply a matter of iterations. I have claude code connected to Playwright, context7 for docs and to Playwright, so it can iterate by itself if there are syntax errors, runtime errors or problems with the data on the backend side. Currently I have near zero cases when it does not produce valid working code. If it is incorrect in some aspect, it is then not that hard to steer it to better solution or to fix yourself.

And even if it failed in implementing most of these stages of the plan, it's not all wasted time. I brainstormed ideas, formed the requirements, specifications to features and have clear documentation and plan of the implementation, unit tests, etc. and I can use it to code it myself. So even in the worst case scenario my development workflow is improved.

▲

mathiaspoint 3 days ago | parent | next [-]

It definitely isn't. LLMs often end up stuck in weird corners they just don't get and need someone familiar with the theory of what they're working on to unstick them. If the agent is the same model as the code generator it won't be able to on its own.

	▲	brulard 3 days ago \| parent \| next [-]
		I was getting to stuck state with Gemini and to lesser extent with Sonnet 4, but my cases were resolved by Opus. I think it is mostly due to size of the task and if you split it in advance to smaller chunks, all these models has much higher probability to resolve.
	▲	sawjet 3 days ago \| parent \| prev [-]
		Skill issue

▲

nojs 3 days ago | parent | prev [-]

Could you explain your exact playwright setup in more detail? I’ve found that claude really struggles to end-to-end test complex features that require browser use. It gets stuck for several minutes trying to find the right button to click for example.

	▲	brulard 3 days ago \| parent [-]
		No special setup, just something along "test with playwright" in the process list. It can get stuck, but for me it was not often enough for me to care. If it happens, I push it in the right direction.