Remix.run Logo
clan 3 hours ago

As Jeff states there are really no Thunderbolt switches which currently limits the size of the cluster.

But would it be possible to utilize RoCE with these boxes rather than RDMA over Thunderbolt? And what would the expected performance be? As I understand RDMA should be 7-10 times faster than via TCP. But if I understand it correctly RoCE is RDMA over Converged Ethernet. So using ethernet frames and lower layer rather than TCP.

10G Thunderbolt adapters are fairly common. But you can find 40G and 80G Thunderbolt ethernet adapters from Atto. Probably not cheap - but would be fun to test! But ieven if the bandwidth is there we might get killed with latency.

Imagine this hardware with a PCIe slot. The Infiniband hardware is there - then we "just" need the driver.

KeplerBoy 3 hours ago | parent | next [-]

At that point you could just breakout the thunderbolt to PCIe and use a regular NIC. Actually, I'm pretty sure that's all that to the Atto Thunderlink, a case around a broadcom nic.

Then you _just_ need the driver. Fascinating, Apple ships MLX5 drivers, that's crazy imo. I understand that's something they might need internally, but shipping that on ipadOs is wild. https://kittenlabs.de/blog/2024/05/17/25gbit/s-on-macos-ios/

clan 3 hours ago | parent [-]

That is what I am suggesting with the Atto adapter.

Infiniband is way faster and lower latency than a NIC. These days NIC==Ethernet.

KeplerBoy 3 hours ago | parent [-]

What makes you think Infiniband is faster than Ethernet? Aren't they pretty much equal these days with RDMA and kernel bypass?

jamesfmilne 3 hours ago | parent | prev [-]

macOS ships with drivers for Mellanox ConnectX cards, but I have no idea if they will show up in `ibv_devices` or `ibv_devinfo`.