| ▲ | Palmik 10 hours ago | ||||||||||||||||||||||||||||||||||||||||
Also does not beat GPT-5.1 Codex on terminal bench (57.8% vs 54.2%): https://www.tbench.ai/ I did not bother verifying the other claims. | |||||||||||||||||||||||||||||||||||||||||
| ▲ | HereBePandas 10 hours ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||
Not apples-to-apples. "Codex CLI (GPT-5.1-Codex)", which the site refers to, adds a specific agentic harness, whereas the Gemini 3 Pro seems to be on a standard eval harness. It would be interesting to see the apples-to-apples figure, i.e. with Google's best harness alongside Codex CLI. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||