| ▲ | qwesr123 4 hours ago | |
Where are you getting SWE-Bench Verified scores for 5.2-Codex? AFAIK those have not been published. And I don't think your Terminal-Bench 2.0 scores are accurate. Per the latest benchmarks: Opus 4.5 is at 59% GPT-5.2-Codex is at 64% See the charts at the bottom of https://marginlab.ai/blog/swe-bench-deep-dive/ and https://marginlab.ai/blog/terminal-bench-deep-dive/ | ||