Remix.run Logo
hansonw 5 hours ago

Rest assured that we are better at training models than naming them ;D

- New benchmark SOTAs with 77.9% on SWE-Bench-Verified, 79.9% on SWE-Lancer, and 58.1% on TerminalBench 2.0

- Natively trained to work across many hours across multiple context windows via compaction

- 30% more token-efficient at the same reasoning level across many tasks

Let us know what you think!

sinatra 4 hours ago | parent | next [-]

I currently use GPT‑5.1-Codex High and have a workflow that works well with the 5-hour/weekly limits, credits, et al. If I use GPT‑5.1-Codex-Max Medium or GPT‑5.1-Codex-Max High, how will that compare cost / credits / limits wise to GPT‑5.1-Codex High? I don't think that's clear. "Reduced tokens" makes me think it'll be priced similarly / lower. But, "Max" makes me think it'll be priced higher.

agentifysh 5 hours ago | parent | prev | next [-]

did you address this https://github.com/openai/codex/issues/6426 ?

how much more token efficient is this compared to 5.0

had to use 5.0 because 5.1 was eating tokens like crazy and seemed like a slight incremental improvement barely noticeable

qsort 5 hours ago | parent | prev | next [-]

Codex is an outstanding product and incremental upgrades are always welcome. I'll make sure to give it a try in the coming days. Great work! :)

iyn 5 hours ago | parent | prev | next [-]

Looks like a great change! I'll take it for a spin in a moment.

I really like the "subagent" feature in Claude Code — it's super useful to manage context in complex codebases. Here are some examples of agents that can be useful: https://github.com/humanlayer/humanlayer/tree/main/.claude/a...

Would it make sense to have a similar feature in Codex CLI? I often do "spec-driven development", which is basically a loop of:

    research -> implementation plan -> actual implementation (based on research + plan) -> validation
I have multiple subagents that I use for each phase that (based on subjective judgement) improve the output quality (vs keeping everything, every tool use etc. in the "main" context window).

Codex CLI is great and I use it often but I'd like to have more of these convenient features for managing context from CC. I'm super happy that compaction is now available, hopefully we'll get more features for managing context.

carbocation 4 hours ago | parent | prev | next [-]

It would be great to have access to this model via the chat interface, even if it was gated behind the "other models" dropdown or something.

NitpickLawyer 5 hours ago | parent | prev | next [-]

Will -minis come for the codex family of models? About two months ago I used 5-mini as a daily driver for a few weeks and quite liked it, it seemed capable enough on small tasks with some hand holding and the speed/price were great as well.

coder543 5 hours ago | parent [-]

codex-mini was released a couple of weeks ago: https://platform.openai.com/docs/models/gpt-5.1-codex-mini

NitpickLawyer 4 hours ago | parent [-]

Thanks! I somehow missed that. Will check it out.

SoKamil 39 minutes ago | parent | prev | next [-]

> Natively trained

What does it even mean?

kaveh_h 13 minutes ago | parent [-]

Probably that before it was given system instructions on how to do compaction and now the compaction is learned by the model making it a native ability of the model without any extra instruction used in the prompt.

2 hours ago | parent | prev | next [-]
[deleted]
andai 4 hours ago | parent | prev | next [-]

So context window is still 400k but the model got good at removing irrelevant context?

robotswantdata 4 hours ago | parent | prev | next [-]

Sorry don’t like the max model, feels like it needs a lot more guiding. The plans it writes however are better, so I tried feeding it back in (meta prompt style) and working okay so far. Very large repository.

EnPissant 5 hours ago | parent | prev | next [-]

Compaction is just what Claude Code has done forever, right?

GardenLetter27 5 hours ago | parent | next [-]

I think the point here is not that it does compaction (which Codex also already does) - but that the model was trained with examples of the Codex compaction, so it should perform better when compaction has taken place (a common source for drops in performance for earlier models).

EnPissant 5 hours ago | parent [-]

Codex previously did only manual compaction, but yeah, maybe some extra training for compaction, too?

enraged_camel 5 hours ago | parent | prev [-]

I am also trying to understand the difference between compaction, and what IDEs like Cursor do when they "summarize" context over long-running conversations.

Is this saying that said summarization now happens at the model level? Or are there other differences?

blks 2 hours ago | parent | prev [-]

I think your company will fail soon.

meowface 2 hours ago | parent [-]

I would bet a lot of money it will not.