Are we reading the same chart? They have Sonnet <= high as Pareto dominant on $/perf.
You have to test each task obviously but it is not a bad model on its face.