| ▲ | mythz 5 hours ago | |||||||
Did the napkin math on M3 Ultra ROI when DeepSeek V3 launched: at $0.70/2M tokens and 30 tps, a $10K M3 Ultra would take ~30 years of non-stop inference to break even - without even factoring in electricity. Clearly people aren't self-hosting to save money. I've got a lite GLM sub $72/yr which would require 138 years to burn through the $10K M3 Ultra sticker price. Even GLM's highest cost Max tier (20x lite) at $720/yr would buy you ~14 years. | ||||||||
| ▲ | ljosifov 5 hours ago | parent | next [-] | |||||||
Everyone should do the calculation for themselves. I too pay for couple of subs. But I'm noticing having an agent work for me 24/7 changes the calculation somewhat. Often not taken into account: the price of input tokens. To produce 1K of code for me, the agent may need to churn through 1M of tokens of codebase. IDK if that will be cached by the API provider or not, but that makes x5-7 times price difference. OK discussion today about that and more https://x.com/alexocheema/status/2020626466522685499 | ||||||||
| ▲ | wongarsu 5 hours ago | parent | prev | next [-] | |||||||
And it's worth noting that you can get DeepSeek at those prices from DeepSeek (Chinese), DeepInfra (US with Bulgarian founder), NovitaAI (US), AtlasCloud (US with Chinese founder), ParaSail (US), etc. There is no shortage of companies offering inference, with varying levels of trustworthiness, certificates and promises around (lack of) data retention. You just have to pick one you trust | ||||||||
| ▲ | oceanplexian 5 hours ago | parent | prev | next [-] | |||||||
Doing inference with a Mac Mini to save money is more or less holding it wrong. Of course if you buy some overpriced Apple hardware it’s going to take years to break even. Buy a couple real GPUs and do tensor parallelism and concurrent batch requests with vllm and it becomes extremely cost competitive to run your own hardware. | ||||||||
| ||||||||
| ▲ | DeathArrow 4 hours ago | parent | prev | next [-] | |||||||
I don't think an Apple PC can run full Deepseek or GLM models. Even if you quantize the hell out of the models to fit in the memory, they will be very slow. | ||||||||
| ▲ | retr0rocket 5 hours ago | parent | prev [-] | |||||||
[dead] | ||||||||