| ▲ | iLoveOncall 5 hours ago | |||||||
Given that users prefered it to Sonnet 4.5 "only" in 70% of the cases (according to their blog post) makes me highly doubt that this is representative of real-life usage. Benchmarks are just completely meaningless. | ||||||||
| ▲ | jwolfe 5 hours ago | parent [-] | |||||||
For cases where 4.5 already met the bar, I would expect 50% preference each way. This makes it kind of hard to make any sense of that number, without a bunch more details. | ||||||||
| ||||||||