For me it was like this for like a year (using Cline + Sonnet & Gemini) until Claude Code came out and until I learned how to keep context real clean. The key breakthrough was treating AI as an architect/implementer rather than a code generator.

Most recently I ask first CC to create a design document for what we are going to do. He has instructions to look into the relevant parts of the code and docs to reference them. I review it and few back-and-forths we have defined what we want to do. Next step is to chunk it into stages and even those to smaller steps. All this may take few hours, but after this is well defined, I clear the context. I then let him read the docs and implement one stage. This goes mostly well and if it doesn't I either try to steer him to correct it, or if it's too bad, I improve the docs and start this stage over. After stage is complete, we commit, clear context and proceed to next stage.

This way I spend maybe a day creating a feature that would take me maybe 2-3. And at the end we have a document, unit tests, storybook pages, and features that gets overlooked like accessibility, aria-things, etc.

At the very end I like another model to make a code review.

Even if this didn't make me faster now, I would consider it future-proofing myself as a software engineer as these tools are improving quickly

▲

imiric 3 days ago | parent | next [-]

This is a common workflow that most advanced users are familiar with.

Yet even following it to a T, and being really careful with how you manage context, the LLM will still hallucinate, generate non-working code, steer you into wrong directions and dead ends, and just waste your time in most scenarios. There's no magical workflow or workaround for avoiding this. These issues are inherent to the technology, and have been since its inception. The tools have certainly gotten more capable, and the ecosystem has matured greatly in the last couple of years, but these issues remain unsolved. The idea that people who experience them are not using the tools correctly is insulting.

I'm not saying that the current generation of this tech isn't useful. I've found it very useful for the same scenarios GP mentioned. But the above issues prevent me from relying on it for anything more sophisticated than that.

▲

brulard 3 days ago | parent [-]

> These issues are inherent to the technology

That's simply false. Even if LLMs don't produce correct and valid code on first shot 100% times of the cases, if you use an agent, it's simply a matter of iterations. I have claude code connected to Playwright, context7 for docs and to Playwright, so it can iterate by itself if there are syntax errors, runtime errors or problems with the data on the backend side. Currently I have near zero cases when it does not produce valid working code. If it is incorrect in some aspect, it is then not that hard to steer it to better solution or to fix yourself.

And even if it failed in implementing most of these stages of the plan, it's not all wasted time. I brainstormed ideas, formed the requirements, specifications to features and have clear documentation and plan of the implementation, unit tests, etc. and I can use it to code it myself. So even in the worst case scenario my development workflow is improved.

▲

mathiaspoint 3 days ago | parent | next [-]

It definitely isn't. LLMs often end up stuck in weird corners they just don't get and need someone familiar with the theory of what they're working on to unstick them. If the agent is the same model as the code generator it won't be able to on its own.

	▲	brulard 3 days ago \| parent \| next [-]
		I was getting to stuck state with Gemini and to lesser extent with Sonnet 4, but my cases were resolved by Opus. I think it is mostly due to size of the task and if you split it in advance to smaller chunks, all these models has much higher probability to resolve.
	▲	sawjet 3 days ago \| parent \| prev [-]
		Skill issue

▲

nojs 3 days ago | parent | prev [-]

Could you explain your exact playwright setup in more detail? I’ve found that claude really struggles to end-to-end test complex features that require browser use. It gets stuck for several minutes trying to find the right button to click for example.

	▲	brulard 3 days ago \| parent [-]
		No special setup, just something along "test with playwright" in the process list. It can get stuck, but for me it was not often enough for me to care. If it happens, I push it in the right direction.

▲

aatd86 3 days ago | parent | prev | next [-]

For me it's the opposite. As long as I ask for small tasks, or error checking, it can help. But I'd rather think of the overall design myself because I tend to figure out corner cases or superlinear complexities much better. I develop better mental models than the NNs. That's somewhat of a relief.

Also the longer the conversation goes, the less effective it gets. (saturated context window?)

	▲	brulard 3 days ago \| parent [-]
		I don't think thats the opposite. I have an idea what I want and to some extent how I want it to be done. The design document starts with a brainstorming where I throw all my ideas at the agent and we iterate together. > Also the longer the conversation goes, the less effective it gets. (saturated context window?) Yes, this is exactly why I said the breakthrough came for me when I learned how to keep the context clean. That means multiple times in the process I ask the model to put the relevant parts of our discussion into an MD document, I may review and edit it and I reset the context with /clear. Then I have him read just the relevant things from MD docs and we continue.

▲

john-tells-all 3 days ago | parent | prev | next [-]

I've seen this referred to as Chain of Thought. I've used it with great success a few times.

https://martinfowler.com/articles/2023-chatgpt-xu-hao.html

▲

ramshanker 3 days ago | parent | prev [-]

Same here. A small variation: I explicitly use website to manage what context it gets to see.

▲

brulard 3 days ago | parent [-]

What do you mean by website? An HTML doc?

▲

ramshanker 3 days ago | parent [-]

I mean the website of AI providers. chatgpt.com , gemini.google.com , claude.ai and so on.

▲

spaceywilly 3 days ago | parent [-]

I’ve had more success this way as well. I will use the model via web ui, paste in the relevant code, and ask it to implement something. It spits out the code, I copy it back into the ide, and build. I tried Claude Code but I find it goes off the rails too easily. I like the chat through the UI because it explains what it’s doing like a senior engineer would

	▲	brulard 3 days ago \| parent [-]
		Well, this is the way we could do it for 2 years already, but basically you are doing the transport layer for the process, which can not be efficient. If you really want to have tight control of what exactly the LLM sees, than that's still an option. But you only get so far with this approach.