I was impressed by the lack of dominance of Thunderbolt:

"Next I tested llama.cpp running AI models over 2.5 gigabit Ethernet versus Thunderbolt 5"

Results from that graph showed only a ~10% benefit from TB5 vs. Ethernet.

Note: The M3 studios support 10Gbps ethernet, but that wasn't tested. Instead it was tested using 2.5Gbps ethernet.

If 2.5G ethernet was only 10% slower than TB, how would 10G Ethernet have fared?

Also, TB5 has to be wired so that every CPU is connected to every other over TB, limiting you to 4 macs.

By comparison, with Ethernet, you could use a hub & spoke configuration with a Ethernet switch, theoretically letting you use more than 4 CPUs.

▲

gwehrli 2 hours ago | parent | next [-]

This Video tests the setup using 10Gbps ethernet: https://www.youtube.com/watch?v=4l4UWZGxvoc

▲

geerlingguy 14 hours ago | parent | prev | next [-]

10G Ethernet would only marginally speed things up based on past experience with llama RPC; latency is much more helpful but still, diminishing returns with that layer split.

▲

MBCook 14 hours ago | parent | prev [-]

That’s llama, which didn’t scale nearly as well in the tests. Assumedly because it’s not optimized yet.

RDMA is always going to have lower overhead than Ethernet isn’t it?

▲

Neywiny 14 hours ago | parent [-]

Possibly RDMA over thunderbolt. But for RoCE (RDMA over converged Ethernet) obviously not because it's sitting on top of Ethernet. Now that could still have a higher throughput when you factor in CPU time to run custom protocols that smart NICs could just DMA instead, but the overhead is still definitively higher

▲

_zoltan_ 3 hours ago | parent [-]

what do you think "ethernet's overhead" is?

▲

Neywiny 3 hours ago | parent [-]

Header and FCS, interpacket gap, and preamble. What do you think "Ethernet overhead" is?

	▲	_zoltan_ 13 minutes ago \| parent [-]
		I've meant in usec, sorry if that wasn't clear, given that the discussion that I've replied was about rpc latency.