There was some tangentially related discussion in this post: https://news.ycombinator.com/item?id=45050415, but this cost analysis answers so many questions, and gives me a better idea of how huge the margin on inference a lot of these providers could be taking. Plus I'm sure that Google or OpenAI can get more favorable data center rates than the average Joe Scmoe.

A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr. With 188 million input tokens/hr and 80 million output tokens/hr, that comes out to around $2/million input tokens, and $4.70/million output tokens.

This is actually a lot more than Deepseek r1's rates of $0.10-$0.60/million input and $2/million output, but I'm sure major providers are not paying AWS p5 on-demand pricing.

Edit: those figures were per node, so the actual input and output prices would be divided by 12.$0.17/million input tokens, and $0.39/million output

▲

zipy124 4 days ago | parent | next [-]

AWS is absolutely not cheap, and never has been. You want to look for the hetzner of the GPU world like runpod.io where they are $2 an hour, so $16/hr for 8, that's already half of aws. You can also get a volume discount if you're looking for 96 almost certainly.

An H100 costs about $32k, amortized over 3-5 years gives $1.21 to $0.7 per hour, so adding in electricity costs and cpu/ram etc... runpod.io is running much closer to the actual cost compared to AWS.

▲

mountainriver 4 days ago | parent | next [-]

Runpods network is the worst I’ve ever seen, their infra in general is terrible. It was started by comcast execs, go figure.

Their GPU availability is amazing though

	▲	thundergolfer 4 days ago \| parent [-]
		Is the network just slow, or just it have outages?

▲

fooker 4 days ago | parent | prev [-]

H100 was 32k three years ago.

Significantly cheaper now that most cloud providers are buying Blackwell.

▲

bluedino 4 days ago | parent | prev | next [-]

> A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr

And what stinks is that you can't even build a Dell/HPE server like this online. You have to 'request a quote' for an 'AI Server'

Going through SuperMicro, you're looking at about $60k for the server, plus 8 GPU's at $25,000 each, so you're close to $300,000 for an 8 GPU node.

Now, that doesn't include networking, storage, racks, electricity, cooling, someone to set that all up for you, $1,000 DAC cables, NVIDIA middleware, downtime as the H100's are the flakiest pieces of junk ever and will need to be replaced every so often...

Setting up a 96 H100 cluster (12 of those puppies) in this case is probably going to cost you $4-5 million. But it should cost less than AWS after a year and a half.

	▲	Tepix 4 days ago \| parent \| next [-]
		I think you can get the server itself quite a bit cheaper than $60k. I found a barebone for around 19400€ at https://www.lambda-tek.de/Supermicro-SYS-821GE-TNHR-sh/B4760...
	▲	Spooky23 4 days ago \| parent \| prev [-]
		> And what stinks is that you can't even build a Dell/HPE server like this online. You have to 'request a quote' for an 'AI Server' The hot parts are/were on allocation to both vendors. They try to sus out your use case and redirect you to less constrained parts.

▲

matt-p 4 days ago | parent | prev | next [-]

188M input / 80M output tokens per hour was per node I thought?

Reversing out these numbers tells us that they're paying about $2/H100/Hour (or $16/hour for a 8xH100 node).

Disclaimer (one of my sites) https://www.serversearcher.com/servers/gpu - says that a one month commit on a 8XH100 node goes for $12.91/hour. The "I'm buying the servers and putting them in COLO rate" usually works out at around $10/Hour, so there's scope here to reduce the cost by ~30% just by doing better/more committed purchasing.

	▲	caminanteblanco 4 days ago \| parent [-]
		You were definitely right, I updated the original comment. Thanks for your correction!

▲

caminanteblanco 4 days ago | parent | prev | next [-]

Ok, so the authors apparently used atlas cloud hosting, which charges $1.80 per h100/hr, which would change the overall cost to around $0.08/ million input and $0.18/million output, which seems much more in line with massive inference margins for major providers.

▲

4 days ago | parent | prev | next [-]

[deleted]

▲

paxys 4 days ago | parent | prev [-]

According to the post their costs were $0.20/1M output tokens (on cloud GPUs), so your numbers are off somewhere.