Remix.run Logo
Tostino 14 hours ago

I don't know, Gemini 2.5 has been the only model that's been able to not consistently make fundamental mistakes with my project as I've been working with it over the last year. Claud 3.7, 4.0, and 4.5 are not nearly as good. I gave up on chatgpt a couple years ago so I have no idea how they perform. They were bad when I quit using it.

ddxv 13 hours ago | parent | next [-]

I use all of them about equally, and I don't really want to argue the point, as I've had this conversation with friends, and it really feels like it is becoming more about brand affiliation and preference. At the end of the day, they're random text generators and asking the same question with different seeds gives different results, and they're all mostly good.

diogenescynic 10 hours ago | parent | prev [-]

Do you find that Gemini results are slightly different when you ask the same question multiple times? I found it to have the least consistently reproducible results compared to others I was trying to use.

Tostino 2 hours ago | parent [-]

Sometimes it will alternate between different design patterns for implementing the same feature on different generations.

If it gets the answer wrong and I notice it, often just regenerating will get past it rather than having to reformulate my prompt.

So, I'd say yeah...it is consistent in the general direction or understanding, but not so much in the details. Adjusting temp does help with that, but I often just leave it default regardless.