| |
| ▲ | koakuma-chan 21 hours ago | parent | next [-] | | That's what I said in my original message. By my account, GPT 5.2 is better than Gemini 3 Pro and Opus 4.5 Gemini 3 Pro is a great foundation model. I use as a math tutor, and it's great. I previously used Gemini 2.5 Pro as a math tutor, and Gemini 3 Pro was a qualitative improvement over that. But Gemini 3 Pro sucks at being a coding agent inside a harness. It sucks at tool calling. It's borderline unusable in Cursor because of that, and likely the same in Antigravity. A few weeks ago I attended a demo of Antigravity that Google employees were giving, and it was completely broken. It got stuck for them during the demo, and they ended up not being able to show anything. Opus 4.5 is good, and faster than GPT-5.2, but less reliable. I use it for medium difficulty tasks. But for anything serious—it's GPT 5.2 | | |
| ▲ | HarHarVeryFunny an hour ago | parent | next [-] | | I'm curious how you are testing/trying these latest models? Do you have specific test/benchmark tasks that they struggle with that you are trying, and/or are you working on a real project and just trying alternatives where another model is not performing well ? | | |
| ▲ | koakuma-chan an hour ago | parent [-] | | I am using Cursor. It has all major models—OpenAI, Anthropic, Google, etc. Every time a new model comes out, I test it on a real project (the app that I am working on at work). |
| |
| ▲ | postalcoder 21 hours ago | parent | prev | next [-] | | Agreed. Gemini 3 is still pretty bad at agentic coding. Just yesterday, in Antigravity, while applying changes, it deleted 500 lines of code and replaced it with a `<rest of code goes here>`. Unacceptable behavior in 2025, lol. | | | |
| ▲ | Mkengin 17 hours ago | parent | prev [-] | | Your experience seems to match the recent results from swe-rebench: https://swe-rebench.com/ |
| |
| ▲ | BeetleB 21 hours ago | parent | prev [-] | | Gemini 3.0 Flash outperforms Pro in many tasks - I believe the coding benchmark was one of them. | | |
| ▲ | HarHarVeryFunny an hour ago | parent [-] | | Presumably that would reflect Gemini 3.0 Flash having more extensive RL for coding training than Pro ? Maybe we can expect a "Gemini 3 Pro Coding" model in the future? Opus 4.5 seems different - Anthropic's best coding model, but also their frontier general purpose model. |
|
|