This is all true, and we've prototyped a number of these things at my current startup. You need to be pretty considered about implementing them.

For a counter-example, consider Claude Code:

- 1 long context window, with (at most) 1 sub-agent

- same tools available at all times, and to sub-agent (except: spawning a sub-sub-agent)

- Full context stays in conversation, until you hit the context window limit. Compaction is automatic but extremely expensive. Quality absolutely takes a dive until everything is re-established.

- Deterministic lookup of content. Claude reads files with tools, not includes random chunks from RAG cosine similarity.

I could go on. In my experience, if you're going to use these techniques 1) maybe don't and 2) turn up the determinism to 11. Get really specific about _how_ you're going to use, and why, in a specific case.

For example, we're working on code migrations [0]. We have a tool that reads changelogs, migration guides, and OSS source. Those can be verbose, so they blow the context window on even 200k models. But we're not just randomly deleting things out of the "plan my migration" context, we're exposing a tool that deliberately lets the model pull out the breaking changes. This is "Context Summarization," but before using it, we had to figure out that _those_ bits were breaking the context, _then_ summarizing them. All our attempts at generically pre-summarizing content just resulted in poor performance because we were hiding information from the agent.

[0] https://tern.sh

▲

jasonjmcghee 7 days ago | parent [-]

What do you mean re Claude Code, "at most 1 sub-agent"?

	▲	trjordan 7 days ago \| parent [-]
		It only spawns a single sub-agent (called Task iirc), which can do everything Claude Code can, except call Task(). This is different from a lot of the context-preserving sub-agents, which have fully different toolsets and prompts. It's much more general.