Remix.run Logo
verdverm 3 hours ago

It's generally anecdotal and vibes when people make claims that some AI is better than another for things they do. There are too many variables and not enough eval for any of it to hold water imo. Personal preferences, experience, brand loyalty, and bias at play too

it's contemporary vim vs emacs at this point

hodgehog11 2 hours ago | parent [-]

I get what you're saying because this is typically true (this is a strong motivator for my current research) but I don't think it applies here and OpenAI seems to agree with me. Some cases are clear: GPT-5 is clearly better than Llama 3 for example. If there is a sizeable enough difference across virtually all evals, it is typically clear that one LLM is a stronger performer than another.

Experiences aside, Gemini 3 beats GPT-5 on enough evals that it seems fair to say that it is a better model. This appears in line with public consensus, with a few exceptions. Those exceptions seem to be centered around search.