| ▲ | irthomasthomas 2 hours ago | |
Anthropic has again changed the set of benchmarks they use[0]. This time they have also moved all benchmark scores to the PDF. At a glance it looks like it gains about ~5-10% over other models. the speed is about the same as opus >=4.5, sonnet 4.5, and double the speed of opus <=4.1
[0] https://news.ycombinator.com/item?id=48312633Edit: Also in the system card... "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). ... Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user." | ||
| ▲ | charles_f an hour ago | parent [-] | |
It's announced as a revolution but when you look at those benchmarks it surely looks like an iteration. | ||