Remix.run Logo
iLoveOncall 5 hours ago

Given that users prefered it to Sonnet 4.5 "only" in 70% of the cases (according to their blog post) makes me highly doubt that this is representative of real-life usage. Benchmarks are just completely meaningless.

jwolfe 5 hours ago | parent [-]

For cases where 4.5 already met the bar, I would expect 50% preference each way. This makes it kind of hard to make any sense of that number, without a bunch more details.

gnatolf 2 hours ago | parent [-]

Good point. So much functionality gets commoditized, we have to move goalposts more or less constantly.