All are PCIe 3.0, I wasn't aware of those switches at all, in spite of buying my risers and cables from that source! Unfortunately all of the slots on the board are x8, there are no x16 slots at all.

So that switch would probably work but I wonder how big the benefit would be: you will probably see effectively an x4 -> (x4 / x8) -> (x8 / x8) -> (x8 / x8) -> (x8 / x4) -> x4 pipeline, and then on to the next set of four boards.

It might run faster on account of the three passes that are are double the speed they are right now as long as the CPU does not need to talk to those cards and all transfers are between layers on adjacent cards (very likely), and with even more luck (due to timing and lack of overlap) it might run the two x4 passes at approaching x8 speeds as well. And then of course you need to do this a couple of times because four cards isn't enough, so you'd need four of those switches.

I have not tried having a single card with fewer lanes in the pipeline but that should be an easy test to see what the effect on throughput of such a constriction would be.

But now you have me wondering to what extent I could bundle 2 x8 into an x16 slot and then to use four of these cards inserted into a fifth! That would be an absolutely unholy assembly but it has the advantage that you will need far fewer risers, just one x16 to x8/x8 run in reverse (which I have no idea if that's even possible but I see no reason right away why it would not work unless there are more driver chips in between the slots and the CPUs, which may be the case for some of the farthest slots).

PCIe is quite amazing in terms of the topology tricks that you can pull off with it, and c-payne's stuff is extremely high quality.

▲

nonplus a day ago | parent [-]

If you end up trying it please share your findings!

I've basically been putting this kind of gear in my cart, and then deciding I dont want to manage more than the 2 3090s, 4090 and a5000 I have now, then I take the PLX out of my cart.

Seeing you have the cards already it could be a good fit!

▲

jacquesm a day ago | parent [-]

Yes, it could be. Unfortunately I'm a bit distracted by both paid work and some more urgent stuff but eventually I will get back to it. By then this whole rig might be hopelessly outdated but we've done some fun experiments with it and have kept our confidential data in-house which was the thing that mattered to me.

▲

r0b05 a day ago | parent [-]

Yes, the privacy is amazing, and there's no rate limiting so you can be as productive as you want. There's also tons of learnings in this exercise. I have just 2x 3090's and I've learnt so much about pcie and hardware that just makes the creative process that more fun.

The next iteration of these tools will likely be more efficient so we should be able to run larger models at a lower cost. For now though, we'll run nvidia-smi and keep an eye on those power figures :)

	▲	jacquesm 21 hours ago \| parent [-]
		You can tune that power down to what gives you the best tokencount per joule, which I think is a very important metric by which to optimize these systems and by which you can compare them as well. I have a hard time understanding all of these companies that toss their NDA's and client confidentiality into the wind and feed newfangled AI companies their corporate secrets with abandon. You'd think there would be a more prudent approach to this.