GPT 5.1 / Codex already beats Gemini 3 on SWE Bench Verified and Terminal Bench and this pushes the gap further. Seems like a decent improvement.