▲ | rvnx 6 days ago | |||||||
Claude Opus 4.1 is way above the others in terms of quality of the answers (especially for programming) | ||||||||
▲ | elAhmo 6 days ago | parent | next [-] | |||||||
That might be your experience. I also prefer Claude for my tasks, but for general usage they are very close. Leaderboards like LLM arena show this and effectively rank all latest models within 20-30 points, which is almost a coin flip. 30 point difference in Elo rating is ~55%/45%, so out of 11 answers, you might prefer 6 from best model, and 5 from worst. | ||||||||
| ||||||||
▲ | croes 6 days ago | parent | prev [-] | |||||||
I play code ping pong between multiple AIs to get some decent code. They all fail at some point |