Remix.run Logo
GeekyBear 16 hours ago

We've seen many recent projects to stream models direct from SSD to a discrete GPU's limited VRAM on PCs.

How big a bottleneck is Thunderbolt 5 compared to an SSD? Is the 120 Gbps mode only available when linked to a monitor?

manmal 16 hours ago | parent [-]

That’s what, 14GB/s? The GPU‘s VRAM can do 100x that.

GeekyBear 16 hours ago | parent [-]

A discrete consumer GPU card doesn't have enough fast RAM to run a very large model that hasn't been quanitized to hell.

That's why all the projects streaming models into the GPU from an SSD popped up recently.

manmal 13 hours ago | parent [-]

Yes. There’s just no way to get above 1t/s that way with a large model.