Remix.run Logo
varispeed an hour ago

This is missing data like when particular model was nerfed or how often provider routes to cheaper less capable model (variants of so called adaptive reasoning).

Cost per token says nothing. For instance if model goes dumb half way the task and you have to start again. If model does that all the time, then the cost is substantially higher than headline figure.

Probably such a service should constantly run various types of tasks on such models and gauge quality of output (though still provider can detect it and pin their best model to skew the results).