Remix.run Logo
koakuma-chan 21 hours ago

That's what I said in my original message. By my account, GPT 5.2 is better than Gemini 3 Pro and Opus 4.5

Gemini 3 Pro is a great foundation model. I use as a math tutor, and it's great. I previously used Gemini 2.5 Pro as a math tutor, and Gemini 3 Pro was a qualitative improvement over that. But Gemini 3 Pro sucks at being a coding agent inside a harness. It sucks at tool calling. It's borderline unusable in Cursor because of that, and likely the same in Antigravity. A few weeks ago I attended a demo of Antigravity that Google employees were giving, and it was completely broken. It got stuck for them during the demo, and they ended up not being able to show anything.

Opus 4.5 is good, and faster than GPT-5.2, but less reliable. I use it for medium difficulty tasks. But for anything serious—it's GPT 5.2

HarHarVeryFunny an hour ago | parent | next [-]

I'm curious how you are testing/trying these latest models? Do you have specific test/benchmark tasks that they struggle with that you are trying, and/or are you working on a real project and just trying alternatives where another model is not performing well ?

koakuma-chan an hour ago | parent [-]

I am using Cursor. It has all major models—OpenAI, Anthropic, Google, etc. Every time a new model comes out, I test it on a real project (the app that I am working on at work).

postalcoder 21 hours ago | parent | prev | next [-]

Agreed. Gemini 3 is still pretty bad at agentic coding.

Just yesterday, in Antigravity, while applying changes, it deleted 500 lines of code and replaced it with a `<rest of code goes here>`. Unacceptable behavior in 2025, lol.

misiti3780 17 hours ago | parent [-]

lol

Mkengin 17 hours ago | parent | prev [-]

Your experience seems to match the recent results from swe-rebench: https://swe-rebench.com/