The browser it built, obviously the context window of the entire project is huge. They mention loads of parallel agents in the blog post, so I guess each agent is given a module to work on, and some tests? And then a 'manager' agent plugs this in without reading the code? Otherwise I can't see how, even with ChatGPT 5.2/Gemini 3, you could do this otherwise? In retrospect it seems an obvious approach and akin to how humans work in teams, but it's still interesting.

▲

simonw 7 hours ago | parent | next [-]

GPT-5.2-Codex has a 400,000 token window. Claude 4.5 Opus is half of that, 200,000 tokens.

It turns out to matter a whole lot less than you would expect. Coding Agents are really good at using grep and writing out plans to files, which means they can operate successfully against way more code than fits in their context at a single time.

	▲	jaggederest 5 hours ago \| parent [-]
		The other issue with "a huge token window" is that if you fill it, it seems like relevance for any specific part of the window is diminished - which makes it hard to override default model behavior. Interestingly, recently it seems to me like codex is actually compressing early and often so that it stays in the smarter-feeling reasoning zone of the first 1/3rd of the window, which is a neat solution for this, albeit with the caveat of post-compression behavior differences cropping up more often.

▲

observationist 7 hours ago | parent | prev | next [-]

Get a good "project manager" agents.md and it changes the whole approach of vibe coding. For a professional environment, with each person given a little domain, arranged in the usual hierarchy of your coding team, truly amazing things can get done.

Presumably the security and validation of code still needs work, I haven't read anything that indicates those are solved yet, so people still need to read and understand the code, but we're at the "can do massive projects that work" stage.

Division of labor and planning and hierarchy are all rapidly advancing, the orchestration and coordination capabilities are going to explode in '26.

▲

nl 5 hours ago | parent | prev | next [-]

Generally they only load a bit of the project into the context at a time. Grep works really well for working out what.

▲

galaxyLogic 6 hours ago | parent | prev [-]

> so I guess each agent is given a module to work on, and some tests?

Who created those agents and gives them the tasks to work on. Who created the tests? AI, or the humans?