Remix.run Logo
qwesr123 4 hours ago

Where are you getting SWE-Bench Verified scores for 5.2-Codex? AFAIK those have not been published.

And I don't think your Terminal-Bench 2.0 scores are accurate. Per the latest benchmarks: Opus 4.5 is at 59% GPT-5.2-Codex is at 64%

See the charts at the bottom of https://marginlab.ai/blog/swe-bench-deep-dive/ and https://marginlab.ai/blog/terminal-bench-deep-dive/