> Because of this I got a motherboard with slow GPU interconnect. It’s good for running many small experiments in parallel (which is my main use case) but horrible for any models split across gpus.

:( you paid a professional pc builder and you weren't told this?

▲

shout5 an hour ago | parent | next [-]

> paid a professional pc builder

They did not. That's a mining rig not a workstation. It's visible from the photo and the chart showing multiple failures over a short period of time including the risers -- which are visibly very low quality -- failing twice.

You have 50K, you call a real expert like Puget Systems or Digital Storm.

▲

mciancia 2 hours ago | parent | prev | next [-]

I wonder why using 2 PSUs resulted in having slower interconnect.

There is no specs in this blogpost regarding cpu/motherboard choice, but if you go with threadripper pro they have 128 pci-e lanes for some time now, so using all GPUs at full speed shouldn't be a problem

▲

zozbot234 2 hours ago | parent | prev | next [-]

If you split models using pipeline/layer parallelism you don't have to care about a slow interconnect, you're just slowed down a lot when running a single inference at a time as opposed to a fully pipelined minibatch. But tensor parallelism requires much faster interconnects than you could get in your average server, so I'm not sure that a different motherboard would help all that much.

▲

m-hodges 2 hours ago | parent | prev | next [-]

what is a "professional pc builder" in 2026

	▲	ok_dad 2 hours ago \| parent [-]
		A guy on Facebook with more confidence and better insurance

▲

CamperBob2 2 hours ago | parent | prev | next [-]

Consumer motherboards can still make sense even if you leave some performance on the table. Running an actual 8x GPU server is not something you'd want to do in an apartment. Imagine the old Lucasfilm "THX" trailer where an unearthly-sounding foghorn whine rises to a sweeping crescendo at reference level, only without the decay at the end.

At the time he put this rig together, there weren't a lot of open-weight LLMs that could run well on 6x48=288 GB, so it probably wasn't a huge loss. There still aren't, really.

Right now I'm in the process of cramming Blackwell cards into an old DDR4-based Milan server, where the important thing is to be able to run large models at all. The GPU fans alone burn over 400 watts at full throttle.

▲

storus 2 hours ago | parent [-]

Did you think about Max-Q cards? 300W and they aren't that noisy either, 14% lower perf than non-Max-Q card.

	▲	CamperBob2 2 hours ago \| parent [-]
		That was an option, but having decided on a true server chassis for other reasons, it made sense to use server-edition cards to take advantage of all those fans. I downclock them to 300W anyway for longevity, but it's nice to have the option to go to 600W if needed. The server is going to live in the garage, so I'm not that concerned with noise. But I had no idea what to expect when I flipped the switch for the first time. It sounds like something out of the Book of Revelation. No way, no how could something like this be used in an inhabited area.

▲

ginko 2 hours ago | parent | prev | next [-]

Don't those Ada 6000 GPUs support NVLink? I think I can even see the cover for the connectors in OP's pic.

edit: Hm, finding mixed information online on whether that's still supported or not. Apparently it was removed in workstation GPUs.

▲

42 minutes ago | parent | next [-]

[deleted]

▲

mciancia 2 hours ago | parent | prev [-]

Nope, they don't support it. And afair even if they did, you would be limited to connecting only in pairs, not all 6 together

	▲	ryandrake 41 minutes ago \| parent [-]
		Honestly, I made the same mistake when I added a GPU to my (not $48K) existing homelab. I got a Ada 4000 for its slim form factor and low wattage, but realize after I bought it that it does not support NVLink, so I can't really effectively double it up later if I wanted to. Live and learn. I suppose you might research that a little before blowing that much money though LOL :)

▲

thecatmak 2 hours ago | parent | prev [-]

[dead]