Remix.run Logo
yjftsjthsd-h 4 days ago

> 3) The shopping link for the mainboard leads to the "ASUS ROG Strix X670E-E Gaming" model. This model can use the 2nd PCIe 5.0 port at only x4 speeds. The RTX 3090 can only do PCIe 4.0 of course so it will run at PCIe 4.0 x4. If you choose a desktop mainboard for having two GPUs, make sure it can run at PCIe x8 speeds when using both GPU slots! Having NVLink between the GPUs is not a replacement for having a fast connection between the CPU+RAM and the GPU and its VRAM.

Forgive a noob question: I thought the connection to the GPU was actually fairly unimportant once the model was loaded, because sending input to the model and getting a response is low bandwidth? So it might matter if you're changing models a lot or doing a model that can work on video, but otherwise I thought it didn't really matter.

Tepix 3 days ago | parent [-]

In general, if all you do is inference with a model that’s in VRAM, you’re right. OTOH it’s simply a matter of picking the right mainboard. If you have one of those sweet new MoE models that won‘t completely fit in your VRAM, offloading means you want PCIe bandwidth, because it will be a bottleneck. Also swapping between LLMs will be faster.