Remix.run Logo
storystarling 13 hours ago

How did you handle the context window for 20k lines? I assume you aren't feeding the whole codebase in every time given the API costs. I've struggled to keep agents coherent on larger projects without blowing the budget, so I'm curious if you used a specific scoping strategy here.

simonw 12 hours ago | parent | next [-]

GPT-5.2 has a 400,000 token context window. Claude Opus 4.5 is just 200,000 tokens. To my surprise this doesn't seem to limit their ability to work with much larger codebases - the coding agent harnesses have got really good at grepping for just the code that they need to have in-context, similar to how a human engineer can make changes to a million lines of code without having to hold it all in their head at once.

storystarling 11 hours ago | parent [-]

That explains the coherence, but I'm curious about the mechanics of the retrieval. Is it AST-based to map dependencies or are you just using vector search? I assume you still have to filter pretty aggressively to keep the token costs viable for a commercial tool.

simonw 10 hours ago | parent [-]

No vector search, just grep.

embedding-shape 10 hours ago | parent | prev | next [-]

I didn't, Codex (tui/cli) did, it does it all by itself. I have one REQUIREMENTS.md which is specific to the project, a AGENTS.md that I reuse across most projects, then I give Codex (gpt-5.2 with reasoning effort set to xhigh) a prompt + screenshot, tells it to get it to work somewhat similar, waits until it completes, reviewed that it worked, then continued.

Most of the time when I develop professionally, I restart the session after each successful change, for this project, I initially tried to let one session go as long as possible, but eventually I reverted back to my old behavior of restarting from 0 after successful changes.

For knowing what file it should read/write, it uses `ls`, `tree` and `ag ` most commonly, there is no out-of-band indexing or anything, just a unix shell controlled by a LLM via tool calls.

nurettin 12 hours ago | parent | prev [-]

You don't load the entire project into the context. You let the agent work on a few 600-800 line files one feature at a time.

storystarling 11 hours ago | parent [-]

Right, but how does it know which files to pick? I'm curious if you're using a dependency graph or embeddings for that discovery step, since getting the agent to self-select the right scope is usually the main bottleneck.

embedding-shape 10 hours ago | parent | next [-]

I gave you a more complete answer here: https://news.ycombinator.com/item?id=46787781

> since getting the agent to self-select the right scope is usually the main bottleneck

I haven't found this to ever be the bottleneck, what agent and model are you using?

nurettin 3 hours ago | parent | prev [-]

If you don't trigger the discovery agents, claude cli uses a search tool and greps 50-100 lines at a go. If discovery is triggered, claude sends multiple agents to the code with different tasks which return with overall architecture notes.