yes, compaction and smaller models help on cost per step.

But my issue wasn’t just inefficiency, it was agents retrying when they shouldn’t.

I needed visibility + limits per agent/task, and the ability to cut it off, not just optimize it.

I'm working on a fun project I call OpenFAST, which essentially tries to solve the context transitioning - but its still in early days and haven't released anything yet.

I think one of the bigger issues, is the o(n) orchestration to agent calls that often feels uncontrolled .. ending up making the orchestrator of sub-agents the main bottleneck due to the large context it sometimes ends up with.

I'm working on an idea where agents delivers briefs & deliveres as real artifacts, and then having each spawned sub-agent read briefs, and if they need further information pick up the delivery for that specific brief.

It helps drift detection across agents, and the best part is orchestrator only delegates jobs, but doesn't do much further than that.

Whenever sub-agents has delivered their tasks, orchestrator can then read a merged brief/delivery for that specific round.

So far it helps cutting that extra tool call where each sub-agent answers the orchestrator - but it also helps the orchestrator only dwelve into deliveries which it believes are relevant rather than trying to understand and comprehend every small detail.

I can share more when I'm a bit further maybe you could get some inspiration here.

	▲	bhaviav100 4 hours ago \| parent [-]
		This is interesting and I would love to understand more on this..is there a GitHub which I can look at? Here's something which would help you with another perspective on the contexts https://authority.bhaviavelayudhan.com/journal/35