▲ | fourthark 8 hours ago | |
Seems like training would be a better match, where you need tons of compute but don’t care about latency. | ||
▲ | ronsor 33 minutes ago | parent [-] | |
No, the problem is that with training, you do care about latency, and you need a crap-ton of bandwidth too! Think of the all_gather; think of the gradients! Inference is actually easier to distribute. |