| ▲ | himata4113 6 hours ago | |||||||||||||||||||||||||
I was looking into self-hosting deekseek v4 pro since frankly cache reads are an absolute scam and they're 90% of the cost, but then I looked at the ROI and it will never pay off fast enough because the hardware will become obsolete faster even if you were running 10 token generation streams 24/7. The napkin math resulted that renting is around 27 times cheaper than owning (not including power). I think we're really screwed when it comes to having owned access to AI unless intel comes out swinging with a c series card that has 128gb vram so we can run them in a 4x128gb configuration, but seems unlikely since nvidia has a large share in them. This was calculated expecting around 30tok/s, of course you can get 2-5tok/s much much cheaper, but it's unusable for my workflow. | ||||||||||||||||||||||||||
| ▲ | kingstnap 5 hours ago | parent | next [-] | |||||||||||||||||||||||||
Ironically the few people not scamming you for cache reads are Deepseek. Everyone else charges a ridiculous amount but Deepseeks API is $0.003625 / M tok. I'm surprised no one talks about this because of how significant it is. GPT 5.5 for example costs a ridiculous $0.50 / M tok cached. It's literally almost 140 times cheaper which matters a lot for tool calls. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | dist-epoch 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
The only way to profitable serve AI is to have large batch sizes - run 500 requests at the same time. If you serve a single user you'll never get your electricity price back, nevermind hardware costs. | ||||||||||||||||||||||||||
| ▲ | varispeed 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||
Would you mind sharing the napkin maths? | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||