| ▲ | panarky 2 hours ago | |
The depreciation schedule isn't as big a factor as you'd think. The marginal cost of an API call is small relative to what users pay, and utilization rates at scale are pretty high. You don't need perfect certainty about GPU lifespan to see that the spread between cost-per-token and revenue-per-token leaves a lot of room. And datacenter GPUs have been running inference workloads for years now, so companies have a good idea of rates of failure and obsolescence. They're not throwing away two-year-old chips. | ||
| ▲ | mbesto an hour ago | parent [-] | |
> The marginal cost of an API call is small relative to what users pay, and utilization rates at scale are pretty high. How do you know this? > You don't need perfect certainty about GPU lifespan to see that the spread between cost-per-token and revenue-per-token leaves a lot of room. You can't even speculate this spread without knowing even a rough idea of cost-per-token. Currently, it's total paper math on what the cost-per-token is. > And datacenter GPUs have been running inference workloads for years now, And inference resource intensity is a moving target. If a new model comes out that requires 2x the amount of resources now. > They're not throwing away two-year-old chips. Maybe, but they'll be replaced by either (a) a higher performance GPU that can deliver the same results with less energy, less physical density, and less cooling or (b) the extended support costs becomes financially untenable. | ||