Remix.run Logo
xmddmx 15 hours ago

I was impressed by the lack of dominance of Thunderbolt:

"Next I tested llama.cpp running AI models over 2.5 gigabit Ethernet versus Thunderbolt 5"

Results from that graph showed only a ~10% benefit from TB5 vs. Ethernet.

Note: The M3 studios support 10Gbps ethernet, but that wasn't tested. Instead it was tested using 2.5Gbps ethernet.

If 2.5G ethernet was only 10% slower than TB, how would 10G Ethernet have fared?

Also, TB5 has to be wired so that every CPU is connected to every other over TB, limiting you to 4 macs.

By comparison, with Ethernet, you could use a hub & spoke configuration with a Ethernet switch, theoretically letting you use more than 4 CPUs.

gwehrli 2 hours ago | parent | next [-]

This Video tests the setup using 10Gbps ethernet: https://www.youtube.com/watch?v=4l4UWZGxvoc

geerlingguy 14 hours ago | parent | prev | next [-]

10G Ethernet would only marginally speed things up based on past experience with llama RPC; latency is much more helpful but still, diminishing returns with that layer split.

MBCook 14 hours ago | parent | prev [-]

That’s llama, which didn’t scale nearly as well in the tests. Assumedly because it’s not optimized yet.

RDMA is always going to have lower overhead than Ethernet isn’t it?

Neywiny 14 hours ago | parent [-]

Possibly RDMA over thunderbolt. But for RoCE (RDMA over converged Ethernet) obviously not because it's sitting on top of Ethernet. Now that could still have a higher throughput when you factor in CPU time to run custom protocols that smart NICs could just DMA instead, but the overhead is still definitively higher

_zoltan_ 3 hours ago | parent [-]

what do you think "ethernet's overhead" is?

Neywiny 3 hours ago | parent [-]

Header and FCS, interpacket gap, and preamble. What do you think "Ethernet overhead" is?

_zoltan_ 13 minutes ago | parent [-]

I've meant in usec, sorry if that wasn't clear, given that the discussion that I've replied was about rpc latency.