▲ | yunohn 15 hours ago | |
No, they’re saying training a model, specifically DeepSeek, costs X using N hrs of Y GPU rental. | ||
▲ | yorwba 6 hours ago | parent [-] | |
If by "they" you mean DeepSeek, they're not saying this, since you might not actually be able to rent a cluster of 512 H800s wired together with high-bandwidth interconnects at that GPU-hour price point. If you rent smaller groups of GPUs piecemeal in different locations and try to transfer weight updates between them over the internet, it'll kill your throughput. |