| ▲ | mstank 4 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||
As the models have progressively improved (able to handle more complex code bases, longer files, etc) I’ve started using this simple framework on repeat which seems to work pretty well at one shorting complex fixes or new features. [Research] ask the agent to explain current functionality as a way to load the right files into context. [Plan] ask the agent to brainstorm the best practices way to implement a new feature or refactor. Brainstorm seems to be a keyword that triggers a better questioning loop for the agent. Ask it to write a detailed implementation plan to an md file. [clear] completely clear the context of the agent —- better results than just compacting the conversation. [execute plan] ask the agent to review the specific plan again, sometimes it will ask additional questions which repeats the planning phase again. This loads only the plan into context and then have it implement the plan. [review & test] clear the context again and ask it to review the plan to make sure everything was implemented. This is where I add any unit or integration tests if needed. Also run test suites, type checks, lint, etc. With this loop I’ve often had it run for 20-30 minutes straight and end up with usable results. It’s become a game of context management and creating a solid testing feedback loop instead of trying to purely one-shot issues. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jarjoura 2 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
As of Dec 2025, Sonnet/Opus and GPTCodex are both trained and most good agent tools (ie. opencode, claude-code, codex) have prompts to fire off subagents during an exploration (use the word explore) and you should be able to Research without needing the extra steps of writing plans and resetting context. I'd save that expense unless you need some huge multi-step verifiable plan implemented. The biggest gotcha I found is that these LLMs love to assume that code is C/Python but just in your favorite language of choice. Instead of considering that something should be written encapsulated into an object to maintain state, it will instead write 5 functions, passing the state as parameters between each function. It will also consistently ignore most of the code around it, even if it could benefit from reading it to know what specifically could be reused. So you end up with copy-pasta code, and unstructured copy-pasta at best. The other gotcha is that claude usually ignores CLAUDE.md. So for me, I first prompt it to read it and then I prompt it to next explore. Then, with those two rules, it usually does a good job following my request to fix, or add a new feature, or whatever, all within a single context. These recent agents do a much better job of throwing away useless context. I do think the older models and agents get better results when writing things to a plan document, but I've noticed recent opus and sonnet usually end up just writing the same code to the plan document anyway. That usually ends up confusing itself because it can't connect it to the code around the changes as easily. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | godzillafarts 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
This is effectively what I'm doing, inspired by HumanLayer's Advanced Context Engineering guidelines: https://github.com/humanlayer/advanced-context-engineering-f... We've taken those prompts, tweaked them to be more relevant to us and our stack, and have pulled them in as custom commands that can be executed in Claude Code, i.e. `/research_codebase`, `/create_plan`, and `/implement_plan`. It's working exceptionally well for me, it helps that I'm very meticulous about reviewing the output and correcting it during the research and planning phase. Aside from a few use cases with mixed results, it hasn't really taken off throughout our team unfortunately. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | asim 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I don't do any of that. I find with GitHub copilot and Claude sonnet 4.5 if I'm clear enough about the what and where it'll sort things out pretty well, and then there's only reiteration of code styling or reuse of functionality. At that point it has enough context to keep going. The only time I might clear that whole thing is if I'm working on an entirely new feature where the context is too large and it gets stuck in summarising the history. Otherwise it's good. But this in codespaces. I find the Tasks feature much harder. Almost a write-off when trying to do something big. Twice I've had it go off on some strange tangent and build the most absurd thing. You really need to keep your eyes on it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | zingar 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I’m uneasy having an agent implement several pages of plan and then writing tests and results only at the and of all that. It feels like getting a CS student to write and follow a plan to do something they haven’t worked on before. It’ll report, “Numbers changed in step 6a therefore it worked” [forgetting the pivotal role of step 2 which failed and as a result the agent should have taken step 6b, not 6a]. Or “there is conclusive evidence that X is present and therefore we were successful” [X is discussed in the plan as the reason why action is NEEDED, not as success criteria]. I _think _ that what is going wrong is context overload and my remedy is to have the agent update every step of the plan with results immediately after action and before moving on to action on the next step. When things seem off I can then clear context and have the agent review results step by step to debug its own work: “review step 2 of the results. Are the stated results confident with final conclusions? Quote lines from the results verbatim as evidence.” | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | prmph 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Nothing will really work when the models fail at the most basic of reasoning challenges. I've had models do the complete opposite of what I've put in the plan and guidelines. I've had them go re-read the exact sentences, and still see them come to the opposite conclusion, and my instructions are nothing complex at all. I used to think one could build a workflow and process around LLMs that extract good value from them consistently, but I'm now not so sure. I notice that sometimes the model will be in a good state, and do a long chain of edits of good quality. The problem is, it's still a crap-shoot how to get them into a good state. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | AlexB138 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
This is essentially my exact workflow. I also keep the plan markdown files around in the repo to refer agents back to when adding new features. I have found it to be a really effective loop, and a great way to reprime context when returning to features. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dfsegoat 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Highly recommend using agent based hooks for things like `[review & test]`. At a basic level, they work akin to git-hooks, but they fire up a whole new context whenever certain events trigger (E.g. another agent finishes implementing changes) - and that hook instance is independent of the implementation context (which is great, as for the review case it is a semi-independent reviewer). | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | zeroCalories 2 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I agree this can work okay, but once I find myself doing this much handholding I would prefer to drive the process myself. Coordinating 4 agents and guiding them along really makes you appreciate the mythical-man-month on the scale of hours. | |||||||||||||||||||||||||||||||||||||||||||||||||||||