Remix.run Logo
ecshafer a day ago

What are you writing that Claude is actually writing all of it? Every time I get past the green field stage, I just end up throwing out what it writes half the time since its trash. Claude seems really great at fix this unit test, generate this boiler plate, take this uml and build this framework out. But when I am doing refactorings, or implementing things that are beyond monotonous, I end up writing it all by hand. My best luck is still do the design, query AI for possible choices, sketch out the framework of what I am writing, have AI critique my plan, and then have AI design individual methods, then fix what it writes.

tedmiston 4 hours ago | parent | next [-]

> What are you writing that Claude is actually writing all of it? Every time I get past the green field stage, I just end up throwing out what it writes half the time since its trash.

For the current state of frontier models, you need to break the steps down so that the LLM understands a process like what you might go through as you expect it (which is often different for everyone).

i.e., get it to agree to a spec, then get it to agree to a build plan, agree on unit test signatures, UI etc as needed, then let it build, ...

"Prompt engineering"

cognitiveinline 19 hours ago | parent | prev | next [-]

What you say could be theoretically possible, but it's probably an issue with your usage of if. For eg: if any of this hard non-promptable project is available on github, or you've seen this problem in any large scale github project, you can share that. I've rarely seen a repo and a problem that claude can't chew through with the right prompt.

nbaksalyar 13 hours ago | parent [-]

People keep saying things like

> it's probably an issue with your usage of if

> I've rarely seen a repo and a problem that claude can't chew through with the right prompt

> a skill/PEBKAC issue

But then I remember how Anthropic couldn't fix the flickering issue for many months. It just does not compute.

Is it that people working at Anthropic can't prompt and it's a "skill issue" too? I mean, the terminal does not flicker in a lot of other complex TUI apps that I use every day - Midnight Commander, Emacs, tmux, etc. These are open source, Claude could be prompted to "just do what Midnight Commander does". So what is it?

tedmiston 4 hours ago | parent [-]

What does a random bug in one LLM's frontend app have to do with learning how to do prompt engineering well?

clates 15 hours ago | parent | prev [-]

I mean this with no disrespect, but

> Every time I get past the green field stage, I just end up throwing out what it writes half the time since its trash.

Is a skill/PEBKAC issue. You still need to exercise engineering best-practices like decomposing work to the smallest unit before taking a task on, brainstorming design first and implementation last, clearly defining your success criteria and requirements before beginning any work, etc.

I'm on a >10yr old codebase and have been able to get my org to orchestrate entire features, fully unit tested, e2e tested, storybooked, from scratch without touching an IDE. Refactorings and the endless mountain of 80% completed migrations from one pattern to another are now trivially able to offload.

Point your SOTA de jeur at the original docs, a few of the original examples/PRs and have it draft a skill describing the work, the scope, and the success metrics. Iterate on the skill with the main agent by subagenting to test the skill until you are happy with the result and it mostly gets it right with the guardrails you've defined. Again - keep the scope extremely small. It gives much less rope for the agents to hang themselves with and it is less cognitive load when you have to review/test the PR.

Then set up a reasonable cadence for it to execute an autonomous thread on and review when you get comfortable.

----

The issue I've been running into lately is simply that we've got so many PRs coming in that actually doing thorough human reviews on them is not sustainable relative to the rate the team is creating agents to open them and people (especially juniors and mid level) are getting burned out by essentially having entire days where they are just doing code reviews.