Remix.run Logo
CaliforniaKarl 11 hours ago

The "all nodes connecting to all other nodes" setup reminds me of NUMALink, the interconnect that SGI used on many (most? all?) of their supercomputers. In an ideal configuration, each 4-socket node has two NUMALink connections to every other node. As Jeff says, it's a ton of cables, and you don't have to think of framing or congestion in the same way as with RDMA over Ethernet.

T3OU-736 11 hours ago | parent | next [-]

SGI's HW also had ccNUMA (cache-coherent Non-Uniform Memory Access), which, given the latencies possible in systems _physically_ spanning entire rooms, was quite a feat.

The IRIX OS even had functionality to migrate kobs and theor working memory closer to each other to lower the latency of access.

We see echoes of this when companies like high-frequency traders pay attention to motherboard layouts and co-locate and pin the PTS (proprietary trading systems) processes to specific cores based on which DIMMs are on which side of the memory controller.

_zoltan_ 3 hours ago | parent | prev [-]

just as an NVL72 rack today has 7271 links (18 probably) in the rack connecting all those GPUs together.