| ▲ | ramoz 2 days ago | |||||||
https://x.com/giansegato/status/2002203155262812529/photo/1 https://x.com/METR_Evals/status/2002203627377574113 > Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5. What an insane take for anybody uses these models daily. | ||||||||
| ▲ | MrOrelliOReilly 2 days ago | parent [-] | |||||||
Yes, I personally feel that the "official" benchmarks are increasingly diverging from the everyday reality of using these models. My theory is that we are reaching a point where all the models are intelligent enough for day-to-day queries, so points like style/personality and proper use of web queries and other capabilities are better differentiators than intelligence alone. | ||||||||
| ||||||||