| ▲ | mbesto 3 hours ago | |
> The marginal cost of an API call is small relative to what users pay, and utilization rates at scale are pretty high. How do you know this? > You don't need perfect certainty about GPU lifespan to see that the spread between cost-per-token and revenue-per-token leaves a lot of room. You can't even speculate this spread without knowing even a rough idea of cost-per-token. Currently, it's total paper math on what the cost-per-token is. > And datacenter GPUs have been running inference workloads for years now, And inference resource intensity is a moving target. If a new model comes out that requires 2x the amount of resources now. > They're not throwing away two-year-old chips. Maybe, but they'll be replaced by either (a) a higher performance GPU that can deliver the same results with less energy, less physical density, and less cooling or (b) the extended support costs becomes financially untenable. | ||