Remix.run Logo
XCSme 5 hours ago

On my tests[0] it does a bit worse, and it's almost 2x expensive than Opus 4.7...

I was surprised to see that it failed a Data extraction test (it gets it right 2/3 times, but one time it randomly returns null for a value instead).

It makes sense a bit that it fails more Trivia/Domain-specific knowledge tasks (I think models are more and more trained towards agentic use-case than general intelligence).

[0]: https://aibenchy.com/compare/anthropic-claude-opus-4-7-mediu...

XCSme 5 hours ago | parent | next [-]

For some reason everything is 2x (2x cost, 2x avg response time, 2x reasoning and output tokens)...

Double-checking my test harness, but it's the first model that does this, so I doubt the issue is on my side...

EDIT: Harness seems correct, for straight coding tasks they perform identical: https://i.snipboard.io/5xbpzY.jpg

dwaltrip 4 hours ago | parent | prev | next [-]

Wait, doesn’t the blog post say the price is the same as 4.7?

> Claude Opus 4.8 is available everywhere today. Pricing for regular usage is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. Pricing for fast mode is $10 per million input tokens and $50 per million output tokens.

Where do you see the 2x cost?

XCSme 4 hours ago | parent | next [-]

The total cost of running my benchmarks, was 1.6x higher compared to Opus 4.7, mostly because of 2x output tokens:

https://i.snipboard.io/vrdwTa.jpg

dwaltrip 3 hours ago | parent [-]

ah ok, thanks for clarifying!

spprashant 4 hours ago | parent | prev | next [-]

If it spends 2x tokens to achieve the same result, that's effective 2x cost in a manner of speaking

4 hours ago | parent | prev | next [-]
[deleted]
4 hours ago | parent | prev [-]
[deleted]
SupLockDef 4 hours ago | parent | prev [-]

Releasing a new model is the new way to Jack up the price hehe.

eshack94 2 hours ago | parent [-]

That's exactly right.