Remix.run Logo
jjani 3 days ago

Gemini 2.5 Pro is widely considered superior to 3.7 Sonnet now by heavy users, but they don't have an SWE-bench score. Shows that looking at one such benchmark isn't very telling. Main advantage over Sonnet being that it's better at using a large amount of context, which is enormously helpful during coding tasks.

Sonnet is still an incredibly impressive model as it held the crown for 6 months, which may as well be a decade with the current pace of LLM improvement.

unsupp0rted 3 days ago | parent | next [-]

Main advantage over Sonnet is Gemini 2.5 doesn't try to make a bunch of unrelated changes like it's rewriting my project from scratch.

itsmevictor 3 days ago | parent | next [-]

I find Gemini 2.5 truly remarkable and overall better than Claude, which I was a big fan of

enraged_camel 3 days ago | parent [-]

Still doesn't work well in Cursor unfortunately.

ai-christianson 3 days ago | parent | next [-]

Works well in RA.Aid --in fact I'd recommend it as the default model in terms of overall cost and capability.

plantain 3 days ago | parent | prev | next [-]

Working fine here. What problems do you see?

michaelbarton 3 days ago | parent [-]

Not the OP but believe they could be referring to the fact it’s not supported in edit mode yet, only agent mode.

So far for me that’s not been too much of a roadblock. Though I still find overall Gemini struggles with more obscure issues such as SQL errors in dbt

pdntspa 3 days ago | parent | prev [-]

Cline/Roo Code work fine with it

erikw 3 days ago | parent | prev | next [-]

What language / framework are you using? I ask because in a Node / Typescript / React project I experience the opposite- Claude 3.7 usually solves my query on the first try, and seems to understand the project's context, ie the file structure, packages, coding guidelines, tests, etc, while Gemini 2.5 seems to install packages willy-nilly, duplicate existing tests, create duplicate components, etc.

unsupp0rted 2 days ago | parent [-]

Node / Vue

bitbuilder 3 days ago | parent | prev | next [-]

This was incredibly irritating at first, though over time I've learned to appreciate this "extra credit" work. It can be fun to see what Claude thinks I can do better, or should add in addition to whatever feature I just asked for. Especially when it comes to UI work, Claude actually has some pretty cool ideas.

If I'm using Claude through Copilot where it's "free" I'll let it do its thing and just roll back to the last commit if it gets too ambitious. If I really want it to stay on track I'll explicitly tell it in the prompt to focus only on what I've asked, and that seems to work.

And just today, I found myself leaving a comment like this: //Note to Claude: Do not refactor the below. It's ugly, but it's supposed to be that way.

Never thought I'd see the day I was leaving comments for my AI agent coworker.

TuxSH 3 days ago | parent [-]

> If I'm using Claude through Copilot where it's "free"

Too bad Microsoft is widely limiting this -- have you seen their pricing changes?

I also feel like they nerfed their models, or reduced context window again.

Aeolun 3 days ago | parent [-]

Claude is almost comically good outside of copilot. When using through copilot it’s like working with a lobotomized idiot (that complains it generated public code about half the time).

TuxSH 10 hours ago | parent [-]

It used to be good, or at least quite decent in GH Copilot, but it all turned into poop (the completions, the models, everything) ever since they announced the pricing changes.

Considering that M$ obviously trains over GitHub data, I'm a bit pissed, honestly, even if I get GH Copilot Pro for free.

jdgoesmarching 3 days ago | parent | prev | next [-]

Also that Gemini 2.5 still doesn’t support prompt caching, which is huge for tools like Cline.

scrlk 3 days ago | parent [-]

2.5 Pro supports prompt caching now: https://cloud.google.com/vertex-ai/generative-ai/docs/models...

jdgoesmarching 3 days ago | parent [-]

Oh, that must’ve been in the last few days. Weird that it’s only in 2.5 Pro preview but at least they’re headed in the right direction.

Now they just need a decent usage dashboard that doesn’t take a day to populate or require additional GCP monitoring services to break out the model usage.

Workaccount2 3 days ago | parent | prev | next [-]

It's viable context, context length where is doesn't fall apart, is also much longer.

zaptrem 3 days ago | parent | prev [-]

I do find it likes to subtly reformat every single line thereby nuking my diff and making its changes unusable since I can’t verify them that way, which Sonnet doesn’t do.

armen52 3 days ago | parent | prev | next [-]

I don't understand this assertion, but maybe I'm missing something?

Google included a SWE-bench score of 63.8% in their announcement for Gemini 2.5 Pro: https://blog.google/technology/google-deepmind/gemini-model-...

amedviediev 2 days ago | parent | prev | next [-]

I keep seeing this sentiment so often here and on X that I have to wonder if I'm somehow using a different Gemini 2.5 Pro. I've been trying to use it for a couple of weeks already and without exaggeration it has yet to solve a single programming task successfully. It is constantly wrong, constantly misunderstands my requests, ignores constraints, ignores existing coding conventions, breaks my code and then tells me to fix it myself.

spaceman_2020 3 days ago | parent | prev | next [-]

I feel that Claude 3.7 is smarter, but does way too much and has poor prompt adherence

redox99 3 days ago | parent | prev | next [-]

2.5 Pro is very buggy with cursor. It often stops before generating any code. It's likely a cursor problem, but I use 3.7 because of that.

saberience 2 days ago | parent | prev [-]

Eh, I wouldn't say that's accurate, I think it's situational. I code all day using AI tools and Sonnet 3.7 is still the king. Maybe it's language dependent or something, but all the engineers I know are full on Claude-Code at this point.