> Additionally, we’re introducing a new `ultra` mode that goes beyond the capabilities of a single agent by leveraging subagents to accelerate complex work.

I'm curious about how does this work? Do the subagents also get to use the same tools? Will the client be flooded with tool calls? Why extra pricing for a new "model" when the same thing can happen in the client with more controls?

And if it's an army of subagents, why do they compare it to Fable and Mythos? Those models with similar harness would probably bench better I'm guessing

▲

gck1 4 hours ago | parent | next [-]

If it's anything like ClaudeCode's ultracode, it's nothing new or revolutionary.

It's essentially a bunch of subagents being called by a deterministic script written by the main model thread, each eating tokens for lunch and output of which is synthesized by an orchestrator agent.

▲

Sidio 2 hours ago | parent | next [-]

The fact that it's even named Ultra is pretty telling.

▲

mohsen1 4 hours ago | parent | prev | next [-]

Confusion is: ultracode is not a different model with its own benchmarks

	▲	gck1 4 hours ago \| parent [-]
		Neither is OpenaAI's ultra. Article specifically calls it 'mode' and it's not even mentioned in the model card. It's for sure a codex harness feature. EDIT: yeah, it's the same thing. https://github.com/openai/codex/blob/main/codex-rs/core/test...

▲

enraged_camel 2 hours ago | parent | prev [-]

>> If it's anything like ClaudeCode's ultracode, it's nothing new or revolutionary.

OpenAI flat out copying Anthropic is a pretty funny development. It's strong evidence that they've been in catch-up mode.

	▲	gck1 18 minutes ago \| parent [-]
		Eh, pretty much everyone that spent some time tweaking their harness already had a homemade 'ultracode' long before Anthropic did it. OpenAI is just way more careful with what features they add or enable by default in their harness. Anthropic's harness is a junk drawer of random features, with a new feature added every few hours. It feels like they're in panic mode, dropping random things to see what sticks when models are eventually commoditized. I prefer OpenAI way - slow and steady.

▲

rolisz an hour ago | parent | prev | next [-]

If it's anything like Claude Ultracode, it burns 3 million tokens in half an hour with a single prompt.

▲

derwiki 4 hours ago | parent | prev | next [-]

Don’t all the major harnesses (pi, Claude code, codex) utilize sub agents? Def if you direct it to, but I’ve seen at least pi spin them up without explicit instruction.

▲

alansaber 4 hours ago | parent | next [-]

Absolutely yes

▲

te_chris 3 hours ago | parent | prev [-]

With pi they’re an extension, but that’s pi

	▲	MVQ93 10 minutes ago \| parent [-]
		Which specific subagent one do you use?

▲

4 hours ago | parent | prev | next [-]

[deleted]

▲

jamilton 4 hours ago | parent | prev | next [-]

Yeah, I'm interested too. My guess for the reason, if not purely to eke out more performance, is so they can cleanly gather real-world data on this kind of usage.

▲

alansaber 4 hours ago | parent | prev | next [-]

I'm shocked they didn't use subagents already. Maybe they're just talking about their web deployment being unified with codex?

	▲	Sidio 2 hours ago \| parent \| next [-]
		With Codex, subagents are only used if you specifically prompt for them. Unlike Claude Code. Odd since it's the former with excess compute available to them.
	▲	helloplanets 3 hours ago \| parent \| prev [-]
		Deep Research has been using the Orchestrator -> Subagents -> Synthesizer loop since the beginning. It's just strange that they'd put a loop benchmark next to actual model benchmarks. Maybe it's a tune of the base model that works especially well with the subagent loop?

▲

simianwords 4 hours ago | parent | prev [-]

Claude also has ultra code mode which is exactly the same thing. This seems to be different from pro however.