Remix.run Logo
phillipcarter 5 hours ago

Maybe it's because I spend a lot of time breaking up tasks beforehand to be highly specific and narrow, but I really don't run into issues like this at all.

A trivial example: whenever CC suggests doing more than one thing in a planning mode, just have it focus on each task and subtask separately, bounding each one by a commit. Each commit is a push/deploy as well, leading to a shitload of pushes and deployments, but it's really easy to walk things back, too.

toenail 4 hours ago | parent | next [-]

I thought everybody does this.. having a model create anything that isn't highly focused only leads to technical debt. I have used models to create complex software, but I do architecture and code reviews, and they are very necessary.

jkingsman 4 hours ago | parent | next [-]

Absolutely. Effective LLM-driven development means you need to adopt the persona of an intern manager with a big corpus of dev experience. Your job is to enforce effective work-plan design, call out corner cases, proactively resolve ambiguity, demand written specs and call out when they're not followed, understand what is and is not within the agent's ability for a single turn (which is evolving fast!), etc.

bityard 4 hours ago | parent | prev | next [-]

The use case that Anthropic pitches to its enterprise customers (my workplace is one) is that you pretty much tell CC what you want to do, then tell it generate a plan, then send it away to execute it. Legitimized vibe-coding, basically.

Of course they do say that you should review/test everything the tool creates, but in most contexts, it's sort of added as an afterthought.

2 hours ago | parent | prev | next [-]
[deleted]
an hour ago | parent | prev [-]
[deleted]
lelanthran 3 hours ago | parent | prev | next [-]

> Maybe it's because I spend a lot of time breaking up tasks beforehand to be highly specific and narrow, but I really don't run into issues like this at all.

I'm looking at the ticket opened, and you can't really be claiming that someone who did such a methodical deep dive into the issue, and presented a ton of supporting context to understand the problem, and further patiently collected evidence for this... does not know how to prompt well.

aforwardslash an hour ago | parent | next [-]

Its not about prompting; its about planning and plan reviewing before implementing; I sometimes spend days iterating on specification alone, then creating an implementation roadmap and then finally iterating on the implementation plan before writing a single line of code. Just like any formal development pipeline.

I started doing this a while ago (months) precisely because of issues as described.

On the other hand,analyzing prompts and deviations isnt that complex.. just ask Claude :)

FergusArgyll an hour ago | parent | prev | next [-]

The methodical guy confused visible reasoning traces in the UI with reasoning tokens & used claude to hallucinate a report

phillipcarter 3 hours ago | parent | prev [-]

Sure I can.

itmitica 4 hours ago | parent | prev | next [-]

I noticed a regression in review quality. You can try and break the task all you want, when it's crunch time, it takes a file from Gemini's book and silently quits trying and gets all sycophantic.

jonnycoder 4 hours ago | parent | prev [-]

I do the same but I often find that the subtasks are done in a very lazy way.