▲ | gmd63 5 days ago | |||||||||||||
DeepSeek was trained with distillation. Any accurate estimate of training costs should include the training costs of the model that it was distilling. | ||||||||||||||
▲ | ffsm8 5 days ago | parent [-] | |||||||||||||
That makes the calculation nonsensical, because if you go there... you'd also have to include all energy used in producing the content the other model providers used. So now suddenly everyones devices on which they wrote comments on social media, pretty much all servers to have ever served a request to open AI/Google/anthropics bots etc pp Seriously, that claim was always completely disingenuous | ||||||||||||||
|