DeepSeek says its hit AI model cost just $294k to train

PaulRobinson 16 hours ago | parent | next [-]

They made this claim in a peer reviewed paper submitted to Nature, but it’s not clear how peers could evaluate the truth of this claim.

If it’s true, and the consensus is that we are hitting limits of how to improve these models, the hypothesis that the entire market is in a bubble over-indexed on GPU costs [0] starts to look more credible.

At the very least, OpenAI and Anthropic look ridiculously inefficient. Mind you, given the numbers on the Oracle deal don’t add up, this is all starting to sound insane already.

[0] https://www.wheresyoured.at/the-haters-gui/

	▲	fspeech 5 hours ago \| parent [-]
		These numbers were easily supported by those who attempted to replicate the RL portion of their work. The foundational model training is harder to verify but is also not central to the paper.

▲

onion2k 17 hours ago | parent | prev [-]

Maybe, if you don't include the >$10m investment in H800 hardware. Still a lot cheaper than competitors though.

▲

48terry 15 hours ago | parent | next [-]

Yes, if we include a cost they didn't include, the cost would be different.

	▲	beaner_count 12 hours ago \| parent [-]
		More like, if you exclude costs, things cost whatever you want to tell people they cost.

▲

jml7c5 15 hours ago | parent | prev [-]

No, their calculation is based on a rental price of $2/hour.

▲

yorwba 14 hours ago | parent [-]

Right, but they didn't use rented GPUs, so it's a purely notional figure. It's an appropriate value for comparison to other single training runs (e.g. it tells you that turning DeepSeek-V3 into DeepSeek-R1 cost much less than training DeepSeek-V3 from scratch) but not for the entire budget of a company training LLMs.

DeepSeek spent a large amount upfront to build a cluster that they can run lots of small experiments on over the course of several years. If you only focus on the successful ones, it looks like their costs are much lower than they were end-to-end.

▲

yunohn 12 hours ago | parent [-]

No, they’re saying training a model, specifically DeepSeek, costs X using N hrs of Y GPU rental.

	▲	yorwba 4 hours ago \| parent [-]
		If by "they" you mean DeepSeek, they're not saying this, since you might not actually be able to rent a cluster of 512 H800s wired together with high-bandwidth interconnects at that GPU-hour price point. If you rent smaller groups of GPUs piecemeal in different locations and try to transfer weight updates between them over the internet, it'll kill your throughput.