What's the hardware cost to running it?

bbor 2 days ago | parent | next [-]

I was curious, and some [intrepid soul](https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...) did an analysis. Assuming you do everything perfectly and take full advantage of the model's MoE sparsity, it would take:

- To run at full precision: "16–24 H100s", giving us ~$400-600k upfront, or $8-12/h from [us-east-1](https://intuitionlabs.ai/articles/h100-rental-prices-cloud-c...).

- To run with "heavy quantization" (16 bits -> 8): "8xH100", giving us $200K upfront and $4/h.

- To run truly "locally"--i.e. in a house instead of a data center--you'd need four 4090s, one of the most powerful consumer GPUs available. Even that would clock in around $15k for the cards alone and ~$0.22/h for the electricity (in the US).

Truly an insane industry. This is a good reminder of why datacenter capex from since 2023 has eclipsed the Manhattan Project, the Apollo program, and the US interstate system combined...

	▲	oceanplexian 2 days ago \| parent \| next [-]
		All these number are peanuts to a mid sized company. A place I worked at used to spend a couple million just for a support contract on a Netapp. 10 years from now that hardware will be on eBay for any geek with a couple thousand dollars and enough power to run it.
	▲	zargon 2 days ago \| parent \| prev [-]
		That article is a total hallucination. "671B total / 37B active" "Full precision (BF16)" And they claim they ran this non-existent model on vLLM and SGLang over a month and a half ago. It's clickbait keyword slop filled in with V3 specs. Most of the web is slop like this now. Sigh.

▲

redox99 2 days ago | parent | prev | next [-]

Probably like 100 USD/hour

▲

slashdave 2 days ago | parent | prev [-]

"if you have to ask..."