I'd love to see this data joined with common benchmarks, in order to see which models get you the most "bang for your buck", i.e. benchmark score / token cost