A greenfield project is definitely 'easy mode' for an LLM; especially if the problem area is well understood (and documented).

Opus is great and definitely speeds up development even in larger code bases and is reasonably good at matching coding style/standard to that of of the existing code base.

In my opinion, the big issue is the relatively small context that quickly overwhelms the models when given a larger task on a large codebase.

For example, I have a largish enterprise grade code base with nice enterprise grade OO patterns and class hierarchies. There was a simple tech debt item that required refactoring about 30-40 classes to adhere to a slightly different class hierarchy. The work is not difficult, just tedious, especially as unit tests need to be fixed up.

I threw Opus at it with very precise instructions as to what I wanted it to do and how I wanted it to do it. It started off well but then disintegrated once it got overwhelmed at the sheer number of files it had to change. At some point it got stuck in some kind of an error loop where one change it made contradicted with another change and it just couldn't work itself out. I tried stopping it and helping it out but at this point the context was so polluted that it just couldn't see a way out. I'd say that once an LLM can handle more 'context' than a senior dev with good knowledge of a large codebase, LLM will be viable in a whole new realm of development tasks on existing code bases. That 'too hard to refactor this/make this work with that' task will suddenly become viable.

▲

Sammi 2 days ago | parent | next [-]

I just did something similar and it went swimmingly by doing this: Keep the plan and status in an md file. Tell it to finish one file at a time and run tests and fix issues and then to ask whether to proceed with the next file. You can then easily start a new chat with the same instructions and plan and status if the context gets poisoned.

	▲	koyote 2 days ago \| parent [-]
		I might give that a go in the future, but in this case it would've been faster for me to just do the work than to coach it for each file. Also as this was an architectural change there are no tests to run until it's done. Everything would just fail. It's only done when the whole thing is done. I think that might be one of the reasons it got stuck: it was trying to solve issues that it did not prove existed yet. If it had just finished the job and run the tests it would've probably gotten further or even completed it. It's a bit like stopping half way through renaming a function and then trying to run the tests and finding out the build does not compile because it can't find 'old_function'. You have to actually finish and know you've finished before you can verify your changes worked. I still haven't actually addressed this tech debt item (it's not that important :)). But I might try again and either see if it succeeds this time (with plan in an md) or just do the work myself and get Opus to fix the unit tests (the most tedious part).

▲

pigpop 2 days ago | parent | prev | next [-]

You have to think of Opus as a developer whose job at your company lasts somewhere between 30 to 60 minutes before you fire them and hire a new one.

Yes, it's absurd but it's a better metaphor than someone with a chronic long term memory deficit since it fits into the project management framework neatly.

So this new developer who is starting today is ready to be assigned their first task, they're very eager to get started and once they start they will work very quickly but you have to onboard them. This sounds terrible but they also happen to be extremely fast at reading code and documentation, they know all of the common programming languages and frameworks and they have an excellent memory for the hour that they're employed.

What do you do to onboard a new developer like this? You give them a well written description of your project with a clear style guide and some important dos and don'ts, access to any documentation you may have and a clear description of the task they are to accomplish in less than one hour. The tighter you can make those documents, the better. Don't mince words, just get straight to the point and provide examples where possible.

The task description should be well scoped with a clear definition of done, if you can provide automated tests that verify when it's complete that's even better. If you don't have tests you can also specify what should be tested and instruct them to write the new tests and run them.

For every new developer after the first you need a record of what was already accomplished. Personally, I prefer to use one markdown document per working session whose filename is a date stamp with the session number appended. Instruct them to read the last X log files where X is however many are relevant to the current task. Most of the time X=1 if you did a good job of breaking down the tasks into discrete chunks. You should also have some type of roadmap with milestones, if this file will be larger than 1000 lines then you should break it up so each milestone is its own document and have a table of contents document that gives a simple overview of the total scope. Instruct them to read the relevant milestone.

Other good practices are to tell them to write a new log file after they have completed their task and record a summary of what they did and anything they discovered along the way plus any significant decisions they made. Also tell them to commit their work afterwards and Opus will write a very descriptive commit message by default (but you can instruct them to use whatever format you prefer). You basically want them to get everything ready for hand-off to the next 60 minute developer.

If they do anything that you don't want them to do again make sure to record that in CLAUDE.md. Same for any other interventions or guidance that you have to provide, put it in that document and Opus will almost always stick to it unless they end up overfilling their context window.

I also highly recommend turning off auto-compaction. When the context gets compacted they basically just write a summary of the current context which often removes a lot of the important details. When this happens mid-task you will certainly lose parts of the context that are necessary for completing the task. Anthropic seems to be working hard at making this better but I don't think it's there yet. You might want to experiment with having it on and off and compare the results for yourself.

If your sessions are ending up with >80% of the context window used while still doing active development then you should re-scope your tasks to make them smaller. The last 20% is fine for doing menial things like writing the summary, running commands, committing, etc.

People have built automated systems around this like Beads but I prefer the hands-on approach since I read through the produced docs to make sure things are going ok and use them as a guide for any changes I need to make mid-project.

With this approach I'm 99% sure that Opus 4.5 could handle your refactor without any trouble as long as your classes aren't so enormous that even working on a single one at a time would cause problems with the context window, and if they are then you might be able to handle it by cautioning Opus to not read the whole file and to just try making targeted edits to specific methods. They're usually quite good at finding and extracting just the sections that they need as long as they have some way to know what to look for ahead of time.

Hope this helps and happy Clauding!

▲

suzzer99 2 days ago | parent | next [-]

> You have to think of Opus as a developer whose job at your company lasts somewhere between 30 to 60 minutes before you fire them and hire a new one.

I am stealing the heck out of this.

	▲	pigpop 2 days ago \| parent [-]
		Please go ahead, I'm honoured!

▲

pigpop 2 days ago | parent | prev [-]

Follow up: Opus is also great for doing the planning work before you start. You can use plan mode or just do it in a web chat and have them create all of the necessary files based on your explanation. The advantage of using plan mode is that they can explore the codebase in order to get a better understanding of things. The default at the end of plan mode is to go straight into implementation but if you're planning a large refactor or other significant work then I'd suggest having them produce the documentation outlined above instead and then following the workflow using a new session each time. You could use plan mode at the start of each session but I don't find this necessary most of the time unless I'm deviating from the initial plan.

▲

edg5000 2 days ago | parent | prev [-]

This will work (if you add more details):

"Have an agent investiate issue X in modules Y and Z. The agent should place a report at ./doc/rework-xyz-overview.md with all locations that need refactoring. Once you have the report, have agents refactor 5 classes each in parallel. Each agent writes a terse report in ./doc/rework-xyz/ When they are all done, have another agent check all the work. When that agent reports everything is okay, perform a final check yourself"

▲

gck1 2 days ago | parent [-]

And you can automate all this so that it happens every time. I have an `/implement` command that is basically instructed to launch the agents and then do back and forth between them. Then there's a claude code hook that makes sure that all the agents, including the orchestrator and the agents spawned have respected their cycles - it's basically running `claude` with a prompt that tells it to read the plan file and see if the agents have done what they were expected in this cycle - gets executed automatically on each agent end.

	▲	edg5000 a day ago \| parent [-]
		Interesting. Another thing I'll try is editing the system propmts. There are some projects floating around that can edit the minified JavaScript in the client. I also noticed that the "system tools" prompts take up ~5% context (10 ktok).