| ▲ | skybrian 4 hours ago |
| I guess gigawatts is how we roughly measure computing capacity at the datacenter scale? Also saw something similar here: > Costs and pricing are expressed per “token”, but the published data immediately seems to admit that this is a bad choice of unit because it costs a lot more to output a token than input one. It seems to me that the actual marginal quantity being produced and consumed is “processing power”, which is apparently measured in gigawatt hours these days. In any case, I think more than anything this vindicates my original decision not to get too precise. [...] https://backofmind.substack.com/p/new-new-rules-for-the-new-... Is it priced that way, though? I assume next-gen TPU's will be more efficient? |
|
| ▲ | nomel 4 hours ago | parent | next [-] |
| > but the published data immediately seems to admit that this is a bad choice of unit because it costs a lot more to output a token than input one And, that's silly, because API pricing is more expensive for output than input tokens, 5x so for Anthropic [1], and 6x so for OpenAI! [1] https://platform.claude.com/docs/en/about-claude/pricing [2] https://openai.com/api/pricing |
| |
| ▲ | AlphaSite 2 hours ago | parent [-] | | I think for the same model wall time is probably a more intuitive metric; at the end of the day what you’re doing is renting GPU time slices. Large outputs dominate compute time so are more expensive. IMO input and output token counts are actually still a bad metric since they linearise non linear cost increases and I suspect we’ll see another change in the future where they bucket by context length. XL output contexts may be 20x more expensive instead of 10x. | | |
|
|
| ▲ | brokencode 4 hours ago | parent | prev | next [-] |
| Gigawatts seems like more a statement of the power supply and dissipation of the actual facility. I’m assuming you can cram more chips in there if you have more efficient chips to make use of spare capacity? Trying to measure the actual compute is a moving target since you’d be upgrading things over time, whereas the power aspects are probably more fixed by fire code, building size, and utilities. |
| |
| ▲ | delichon 3 hours ago | parent | next [-] | | Measuring data centers in watts is like measuring cars in horsepower. Power isn't a direct measure of performance, but of the primary constraint on performance. When in doubt choose the thermodynamic perspective. | |
| ▲ | stingraycharles 2 hours ago | parent | prev [-] | | I mean a single nuclear reactor delivers around 1GW, so if a single datacenter consumes multiple of those, it gives a reasonably accurate idea of the scale. |
|
|
| ▲ | twoodfin 3 hours ago | parent | prev [-] |
| That these data centers can turn electricity + a little bit of fairly simple software directly into consumer and business value is pretty much the whole story. Compare what you need to add to AWS EC2 to get the same result, above and beyond the electricity. |
| |
| ▲ | zozbot234 3 hours ago | parent [-] | | That's a convenient story, but most consumers' and businesses' use of AI is light enough that they could easily run local models on their existing silicon. Resorting to proprietary AI running in the datacenter would only add a tiny fraction of incremental value over that, and at a significant cost. | | |
| ▲ | astral_drama 2 hours ago | parent | next [-] | | I'm looking forward to running a Gemma 4 turboquant on my 24GB GPU. The perf looks impressive for how compact it is. I often get a 10x more cost effective run processing on my local hardware. Still reaching for frontier models for coding, but find the hosted models on open router good enough for simple work. Feels like we are jumping to warp on flops. My cores are throttled and the fiber is lit. | |
| ▲ | twoodfin 3 hours ago | parent | prev [-] | | Sure but where the puck is going is long-running reasoning agents where local models are (for the moment) significantly constrained relative to a Claude Opus 4.6. |
|
|