Remix.run Logo
pdpi 15 hours ago

> Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDMA is ran on top of?

If you daisy chain four nodes, then traffic between nodes #1 and #4 eat up all of nodes #2 and #3's bandwidth, and you eat a big latency penalty. So, absent a switch, the fully connected mesh is the only way to have fast access to all the memory.

rbanffy 3 hours ago | parent [-]

Can’t you make bandwidth reservations and optimise data location to prefer comms between directly connected nodes over one or two-hop paths?

KeplerBoy 3 hours ago | parent [-]

Sure, one could think of some kind of pipeline parallelism where you only need a fast transfer to the next step in the model and that would boost throughput but not increase model size.