| ▲ | omgJustTest 5 hours ago | |
people have been talking about "models of models" for arbitration opportunity in inference for about 1.5 yrs. Arbitration idea: if a user doesn't need high QOS of newest LLM, slip them a cheaper LLM, run their query at reduced quality. measure if they cost you fewer $s in the lower QOS. => profit. For chatgpt the arbitration opportunity looks more like "we could allocate this amount of gpu to training or inference, we are losing money if we offer the highest quality infra" In addition there's other interesting economics scaling that can be done outside of "models of models" that are far more profitable. I won't go over all of them (and some of them I feel are quite powerful) but the laziest one is that subscription models count on some zombie users as a counterweight to highly expensive single users, and as a source of stable cashflow. Zombie users are ones that are paying for sub but not actively or barely using the service | ||
| ▲ | graerg 3 hours ago | parent [-] | |
They made a big point of explicitly advertising this as a feature with the GPT-5 rollout, no? Routing to cheaper models/less reasoning depending on the input prompt. | ||