Remix.run Logo
nsoonhui 3 days ago

I think the fact that DeepSeek trains on competitor queries (i.e., distillation) — along with using banned Nvidia chips — helps explain how it can achieve such low training costs (USD 6 million vs. billions) while delivering only slightly worse performance than its American counterparts. It also undermines the narrative that DeepSeek or China is posing a serious challenge to the U.S. lead in AI. The gap may be closing, but the initial reactions now seem knee-jerk.

That the discussion has being hijacked and shifted to moral superiority is really unfortunate, because that was never the point in the first place.

whimsicalism 3 days ago | parent [-]

These models never cost billions to train and I doubt the final training run for models like GPT-4 cost more than 8 figures. 6 million is definitely cheaper and I would attribute that to distillation.