Remix.run Logo
jsnell 6 hours ago

I really don't get what you're proposing. The cost ranges do not overlap at the low end. You can't (by definition!) interpolate outside of the range.

If you mean extrapolate, at that point you're just making up data. The available effort levels are discrete and covered totally by the benchmarks. You can draw on the monitor with a sharpie to show a "ultra-low" effort level for Opus that scores better than Sonnet "low" at the same price, but it doesn't magic the ultra-low effort into actual existence.

(Anyway, the blog post now has an errata and a graph that shows substantially better relative performance for Sonnet 5.0 than the original graph.)

energy123 6 hours ago | parent [-]

That's why I said "over the shared frontier" in my first post and more precisely in my second post I said "over the overlapping x values for which both are defined."

It was a claim that applies to a range of x-values where both curves are defined.

Of course if you go beyond those x-values where only one of the two are defined, then trivially the one that is defined constitutes the Pareto frontier in that region. Which is what I understand to be your point?

jsnell 6 hours ago | parent [-]

The post I was replying to said "performs strictly better at the same cost per task". That claim was obviously not true, there are costs where Opus cannot do the task and Sonnet can, so Opus can't be performing strictly better that the same cost. It seems that you agree that it is not true.

You could make it true by artificially dropping some of the data points, but, like, why?

(Again, this is moot given the updated graph.)

> Of course if you go beyond those x-values where only one of the two are defined, then trivially the one that is defined constitutes the Pareto frontier in that region.

Not so! It's only sound to do that at the low end of the cost axis (x) or the high end of the performance axis (y). You can't do it at the low end of the performance axis or the high end of the cost axis.