| ▲ | bbor 2 days ago | |
I was curious, and some [intrepid soul](https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...) did an analysis. Assuming you do everything perfectly and take full advantage of the model's MoE sparsity, it would take: - To run at full precision: "16–24 H100s", giving us ~$400-600k upfront, or $8-12/h from [us-east-1](https://intuitionlabs.ai/articles/h100-rental-prices-cloud-c...). - To run with "heavy quantization" (16 bits -> 8): "8xH100", giving us $200K upfront and $4/h. - To run truly "locally"--i.e. in a house instead of a data center--you'd need four 4090s, one of the most powerful consumer GPUs available. Even that would clock in around $15k for the cards alone and ~$0.22/h for the electricity (in the US). Truly an insane industry. This is a good reminder of why datacenter capex from since 2023 has eclipsed the Manhattan Project, the Apollo program, and the US interstate system combined... | ||
| ▲ | oceanplexian 2 days ago | parent | next [-] | |
All these number are peanuts to a mid sized company. A place I worked at used to spend a couple million just for a support contract on a Netapp. 10 years from now that hardware will be on eBay for any geek with a couple thousand dollars and enough power to run it. | ||
| ▲ | zargon 2 days ago | parent | prev [-] | |
That article is a total hallucination. "671B total / 37B active" "Full precision (BF16)" And they claim they ran this non-existent model on vLLM and SGLang over a month and a half ago. It's clickbait keyword slop filled in with V3 specs. Most of the web is slop like this now. Sigh. | ||