| ▲ | woadwarrior01 2 days ago | ||||||||||||||||
> Opus 4.5 is absolutely a state of the art model. > See: https://artificialanalysis.ai The field moves fast. Per artificialanalysis, Opus 4.5 is currently behind GPT-5.2 (x-high) and Gemini 3 Pro. Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5. | |||||||||||||||||
| ▲ | MrOrelliOReilly 2 days ago | parent | next [-] | ||||||||||||||||
Totally, however OP's point was that Claude had to compensate for deficiencies versus a state of the art model like ChatGPT 5.2. I don't think that's correct. Whether or not Opus 4.5 is actually #1 on these benchmarks, it is clearly very competitive with the other top-tier models. I didn't take "state of the art" to here narrowly mean #1 on a given benchmark, but rather to mean near or at the frontier of current capabilities. | |||||||||||||||||
| ▲ | gessha 2 days ago | parent | prev | next [-] | ||||||||||||||||
One thing to remember when comparing ML models of any kind is that single value metrics obscure a lot of nuance and you really have to go through the model results one by one to see how it performs. This is true for vision, NLP, and other modalities. | |||||||||||||||||
| ▲ | dr_dshiv 2 days ago | parent | prev | next [-] | ||||||||||||||||
https://lmarena.ai/leaderboard/webdev LM Arena shows Claude Opus 4.5 on top | |||||||||||||||||
| |||||||||||||||||
| ▲ | ramoz 2 days ago | parent | prev | next [-] | ||||||||||||||||
https://x.com/giansegato/status/2002203155262812529/photo/1 https://x.com/METR_Evals/status/2002203627377574113 > Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5. What an insane take for anybody uses these models daily. | |||||||||||||||||
| |||||||||||||||||
| ▲ | fzzzy 2 days ago | parent | prev [-] | ||||||||||||||||
is x-high fast enough to use as a coding agent? | |||||||||||||||||
| |||||||||||||||||