▲ | caminanteblanco 4 days ago | ||||||||||||||||||||||
There was some tangentially related discussion in this post: https://news.ycombinator.com/item?id=45050415, but this cost analysis answers so many questions, and gives me a better idea of how huge the margin on inference a lot of these providers could be taking. Plus I'm sure that Google or OpenAI can get more favorable data center rates than the average Joe Scmoe. A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr. With 188 million input tokens/hr and 80 million output tokens/hr, that comes out to around $2/million input tokens, and $4.70/million output tokens. This is actually a lot more than Deepseek r1's rates of $0.10-$0.60/million input and $2/million output, but I'm sure major providers are not paying AWS p5 on-demand pricing. Edit: those figures were per node, so the actual input and output prices would be divided by 12.$0.17/million input tokens, and $0.39/million output | |||||||||||||||||||||||
▲ | zipy124 4 days ago | parent | next [-] | ||||||||||||||||||||||
AWS is absolutely not cheap, and never has been. You want to look for the hetzner of the GPU world like runpod.io where they are $2 an hour, so $16/hr for 8, that's already half of aws. You can also get a volume discount if you're looking for 96 almost certainly. An H100 costs about $32k, amortized over 3-5 years gives $1.21 to $0.7 per hour, so adding in electricity costs and cpu/ram etc... runpod.io is running much closer to the actual cost compared to AWS. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | bluedino 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||
> A node of 8 H100s will run you $31.40/hr on AWS, so for all 96 you're looking at $376.80/hr And what stinks is that you can't even build a Dell/HPE server like this online. You have to 'request a quote' for an 'AI Server' Going through SuperMicro, you're looking at about $60k for the server, plus 8 GPU's at $25,000 each, so you're close to $300,000 for an 8 GPU node. Now, that doesn't include networking, storage, racks, electricity, cooling, someone to set that all up for you, $1,000 DAC cables, NVIDIA middleware, downtime as the H100's are the flakiest pieces of junk ever and will need to be replaced every so often... Setting up a 96 H100 cluster (12 of those puppies) in this case is probably going to cost you $4-5 million. But it should cost less than AWS after a year and a half. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | matt-p 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||
188M input / 80M output tokens per hour was per node I thought? Reversing out these numbers tells us that they're paying about $2/H100/Hour (or $16/hour for a 8xH100 node). Disclaimer (one of my sites) https://www.serversearcher.com/servers/gpu - says that a one month commit on a 8XH100 node goes for $12.91/hour. The "I'm buying the servers and putting them in COLO rate" usually works out at around $10/Hour, so there's scope here to reduce the cost by ~30% just by doing better/more committed purchasing. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | caminanteblanco 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||
Ok, so the authors apparently used atlas cloud hosting, which charges $1.80 per h100/hr, which would change the overall cost to around $0.08/ million input and $0.18/million output, which seems much more in line with massive inference margins for major providers. | |||||||||||||||||||||||
▲ | 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||
[deleted] | |||||||||||||||||||||||
▲ | paxys 4 days ago | parent | prev [-] | ||||||||||||||||||||||
According to the post their costs were $0.20/1M output tokens (on cloud GPUs), so your numbers are off somewhere. |