Remix clone Hacker News

new | show | ask | jobs Github

	▲	qwesr123 4 hours ago
		Where are you getting SWE-Bench Verified scores for 5.2-Codex? AFAIK those have not been published. And I don't think your Terminal-Bench 2.0 scores are accurate. Per the latest benchmarks: Opus 4.5 is at 59% GPT-5.2-Codex is at 64% See the charts at the bottom of https://marginlab.ai/blog/swe-bench-deep-dive/ and https://marginlab.ai/blog/terminal-bench-deep-dive/