| ▲ | Claude Sonnet 5 – benchmark results(artificialanalysis.ai) | ||||||||||||||||||||||||||||||||||||||||
| 39 points by lucamark 9 hours ago | 19 comments | |||||||||||||||||||||||||||||||||||||||||
| ▲ | CSMastermind 8 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||
Using Fable, pretty much every request hit some gate they had for no discernible reason. These provider-level rejections should be incorporated into benchmarks as 0s on the tasks since that's the experience you'll actually get using the model. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | Tiberium 8 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
Seems like the model is incredibly inefficient at max reasoning, and even at high/xhigh it uses far more tokens than other models, including Gemini 3.5 Flash, GLM 5.2 and so on. GPT 5.5's efficiency in tokens is still unmatched. See also: https://cursor.com/cursorbench | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | DrProtic 7 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
I feel like they repackaged Opus, slightly nerfed it, and reduced price per token. A release just to have a headline while Fable situation is getting resolved. | |||||||||||||||||||||||||||||||||||||||||
| ▲ | nsingh2 8 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
Cost per task is shockingly high. More expensive than Opus 4.8, second in place to Fable. Cost per task data is only available for max effort though, might just be very inefficient at that effort level. | |||||||||||||||||||||||||||||||||||||||||
| ▲ | iLoveOncall 8 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
Half of the data is missing and the rest is inconsistent between different graphs and sections. Is the benchmark having Sonnet 5 generate the page and seeing how many hallucinations it has? | |||||||||||||||||||||||||||||||||||||||||
| ▲ | datakan 8 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
I'm so sick of Anthropics usage caps and how their model devours tokens. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | atemerev 8 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||
Yet another mediocre model. Mostly irrelevant among open weights alternatives. Fable wen. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||