Remix.run Logo
beernet 7 days ago

While I tend to agree, the other players (Anthropic, OAI, Google) don't have super unique USPs compared to one another, either. Just to be fair.

elAhmo 7 days ago | parent [-]

I was about to post something similar. Sure, there are preferences and power users are aware which model does things better for their workflow, but for an average user, just giving them a chat box and any latest model from any of the providers would be adequate. They might notice a thing or two being different, but at the end of the day there is almost no sticking point once you take out chat history out of the equation.

rvnx 7 days ago | parent [-]

Claude Opus 4.1 is way above the others in terms of quality of the answers (especially for programming)

elAhmo 7 days ago | parent | next [-]

That might be your experience. I also prefer Claude for my tasks, but for general usage they are very close.

Leaderboards like LLM arena show this and effectively rank all latest models within 20-30 points, which is almost a coin flip. 30 point difference in Elo rating is ~55%/45%, so out of 11 answers, you might prefer 6 from best model, and 5 from worst.

jasonjmcghee 6 days ago | parent [-]

It's crazy how different my personal experience is compared to LLM Arena. Very curious what the use cases people are doing that aren't overlapping with mine.

croes 6 days ago | parent | prev [-]

I play code ping pong between multiple AIs to get some decent code. They all fail at some point