Maybe it's because I spend a lot of time breaking up tasks beforehand to be highly specific and narrow, but I really don't run into issues like this at all.

A trivial example: whenever CC suggests doing more than one thing in a planning mode, just have it focus on each task and subtask separately, bounding each one by a commit. Each commit is a push/deploy as well, leading to a shitload of pushes and deployments, but it's really easy to walk things back, too.

▲

toenail 4 hours ago | parent | next [-]

I thought everybody does this.. having a model create anything that isn't highly focused only leads to technical debt. I have used models to create complex software, but I do architecture and code reviews, and they are very necessary.

	▲	jkingsman 4 hours ago \| parent \| next [-]
		Absolutely. Effective LLM-driven development means you need to adopt the persona of an intern manager with a big corpus of dev experience. Your job is to enforce effective work-plan design, call out corner cases, proactively resolve ambiguity, demand written specs and call out when they're not followed, understand what is and is not within the agent's ability for a single turn (which is evolving fast!), etc.
	▲	bityard 4 hours ago \| parent \| prev \| next [-]
		The use case that Anthropic pitches to its enterprise customers (my workplace is one) is that you pretty much tell CC what you want to do, then tell it generate a plan, then send it away to execute it. Legitimized vibe-coding, basically. Of course they do say that you should review/test everything the tool creates, but in most contexts, it's sort of added as an afterthought.
	▲	2 hours ago \| parent \| prev \| next [-]
		[deleted]
	▲	an hour ago \| parent \| prev [-]
		[deleted]

▲

lelanthran 3 hours ago | parent | prev | next [-]

> Maybe it's because I spend a lot of time breaking up tasks beforehand to be highly specific and narrow, but I really don't run into issues like this at all.

I'm looking at the ticket opened, and you can't really be claiming that someone who did such a methodical deep dive into the issue, and presented a ton of supporting context to understand the problem, and further patiently collected evidence for this... does not know how to prompt well.

	▲	aforwardslash an hour ago \| parent \| next [-]
		Its not about prompting; its about planning and plan reviewing before implementing; I sometimes spend days iterating on specification alone, then creating an implementation roadmap and then finally iterating on the implementation plan before writing a single line of code. Just like any formal development pipeline. I started doing this a while ago (months) precisely because of issues as described. On the other hand,analyzing prompts and deviations isnt that complex.. just ask Claude :)
	▲	FergusArgyll an hour ago \| parent \| prev \| next [-]
		The methodical guy confused visible reasoning traces in the UI with reasoning tokens & used claude to hallucinate a report
	▲	phillipcarter 3 hours ago \| parent \| prev [-]
		Sure I can.

▲

itmitica 4 hours ago | parent | prev | next [-]

I noticed a regression in review quality. You can try and break the task all you want, when it's crunch time, it takes a file from Gemini's book and silently quits trying and gets all sycophantic.

▲

jonnycoder 4 hours ago | parent | prev [-]

I do the same but I often find that the subtasks are done in a very lazy way.