Remix.run Logo
kingstnap 20 hours ago

5 to 10 tokens per second is bungus tier rates.

https://developer.nvidia.com/blog/nvidia-blackwell-delivers-...

NVIDIAs 8xB200 gets you 30ktps on Deepseek 671B at maximum utilization thats 1 trillion tokens per year. At a dollar per million tokens that's $1 million.

The hardware costs around $500k.

Now ideal throughput is unlikely, so let's say your get half that. It's still 500B tokens per year.

Gemini 3 Flash is like $3/million tokens and I assume it's a fair bit bigger, maybe 1 to 2T parameters. I can sort of see how you can get this to work with margins as the AI companies repeated assert.

Denzel 19 hours ago | parent [-]

Cool, that potential 5x cost improvement just got delivered this year. A company can continue running the previous generation until EOL, or take a hit by writing off the residual value - either way they’ll have a mixed cost model that puts their token cost somewhere in the middle between previous and current gens.

Also, you’re missing material capex and opex costs from a DC perspective. Certain inputs exhibit diseconomies of scale when your demand outstrips market capacity. You do notice electricity cost is rising and companies are chomping at the bit to build out more power plants, right?

Again, I ran the numbers for simplicity’s sake to show it’s not clear cut that these models are profitable. “I can sort of see how you can get this to work” agrees with exactly what I said: it’s unclear, certainly not a slam dunk.

Especially when you factor in all the other real-world costs.

We’ll find out soon enough.

surajrmal 13 hours ago | parent [-]

Google runs everything on their tpus which are substantially less costly than to make and use less energy to run. While I'm sure openai and others are bleeding money by subsidizing things, I'm not entirely sure that's true for Google (despite it actually being easier if they wanted to).