| ▲ | zozbot234 2 hours ago | |
If you split models using pipeline/layer parallelism you don't have to care about a slow interconnect, you're just slowed down a lot when running a single inference at a time as opposed to a fully pipelined minibatch. But tensor parallelism requires much faster interconnects than you could get in your average server, so I'm not sure that a different motherboard would help all that much. | ||