Remix.run Logo
lkjdsklf 2 days ago

The problem is, by the time you’ve gone through the process of making a granular plan and all that, you’ve lost all productivity gains of using the agent.

As an engineer, especially as you get more experience, you can kind of visualize the plan for a change very quickly and flesh out the next step while implementing the current step

All you have really accomplished with the kind of process described is make the worlds least precise, most verbose programming language

Benjammer 2 days ago | parent | next [-]

I'm not sure how much experience you have, I'm not trying to make assumptions, but I've been working in software over 15 years. The exact skill you mentioned - can visualize the plan for a change quickly - is what makes my LLM usage so powerful, imo.

I can say the right precise wording in my prompt to guide it to a good plan very quickly. As the other commenter mentioned, the entire above process only takes something like 30-120 minutes depending on scope, and then I can generate code in a few minutes that would take 2-6 weeks to write myself, working 8 hr days. Then, it takes something like 0.5-1.5 days to work out all the bugs and clean up the weird AI quirks and maybe have the LLM write some playwright tests or whatever testing framework you use for integration tests to verify it's own work.

So yes, it takes significant time to plan things well for good results, and yes the results are often sloppy in some parts and have weird quirks that no human engineer would make on purpose, but if you stick to working on prompt/context engineering and getting better and faster at the above process, the key unlock is not that it just does the same coding for you, with it generating the code instead. It's that you can work as a solo developer at the abstraction level of a small startup company. I can design and implement an enterprise grade SSO auth system over a weekend that integrates with Okta and passes security testing. I can take a library written in one language and fully re-implement it in another language in a matter of hours. I recently took the native libraries for Android and iOS for a fairly large, non-trivial SDK, and had Claude build me a React Native wrapper library with native modules that integrates both natives libraries and presents a clean, unified interface and typescript types to the react native layer. This took me about two days, plus one more for validation testing. I have never done this before. I have no idea how "Nitro Modules" works, or how to configure a react native library from scratch. But given the immense scaffolding abilities of LLMs, plus my debugging/hacking skills, I can get to a really confident place, really quickly and ship production code at work with this process, regularly.

adastra22 2 days ago | parent | prev [-]

It takes maybe 30min and then it can go off and generate code that would take literal weeks for me to write. There are still huge productivity gains being had.

lkjdsklf 2 days ago | parent [-]

That has not been my experience at all.

It takes 30-40 minutes to generate a plan and it generates code that would have taken 20-30 minutes to write.

When it’s generating “weeks” worth of code, it inevitably goes off the rails and the crap you get goes in the garbage.

This isn’t to say agents don’t have their uses, but i have not seen this specific problem actually work. They’re great for refactoring (usually) and crapping out proof of concepts and debugging specific problems. It’s also great for exploring a new code base where you have little prior knowledge.

It makes sense that it sucks at generating large amounts of code that fits cohesively into the project. The context is too small. My code base is millions of lines of code. My brain has a shitload more of that in context than any of the models. So they have to guess and check and end up incorrect and poor and i don’t. I know which abstractions exist that i can use. It doesn’t. Sometimes it guesses right. Often Times it doesn’t. And once it’s wrong, it’s fucked for the entire rest of the session so you just have to start over

adastra22 2 days ago | parent [-]

Works for me. Not vanilla Claude code though- you need to put some work into generating slash commands and workflows that keep it on task and catch the bad stuff.

Take this for example: https://www.reddit.com/r/ClaudeAI/comments/1m7zlot/how_planm...

This trick is just the basic stuff, but it works really well. You can add on and customize from there. I have a “/task” slash command that will run a full development cycle with agents generating code, many more (12-20) agent critics analyzing the unstaged work, all orchestrated by a planning agent that breaks the complex task into small atomic steps.

The first stage of this project (generating the plan) is interactive. It can then go off and make 10kLOC code spread over a dozen commits and the quality is good enough to ship, most of the time. If it goes off the rails, keep the plan document but nuke the commits and restart. On the Claude MAX plan this costs nothing.

This is how I do all my development now. I spend my time diagnosing agent failures and fixing my workflows, not guiding the agent anymore (other than the initial plan document).

I still review every line of code before pushing changes.