Remix.run Logo
Tiberium a day ago

Only an 1.7% upgrade on SWE-Bench compared to GPT-5, but 33.9 vs 51.3% on their internal code refactoring benchmark. This seems like an Opus 4.1-like upgrade, which is nice to see and means they're serious about Codex.