| ▲ | manmal 18 hours ago | |||||||||||||||||||||||||
Maybe I’m lacking imagination. But how will a GPU with small-ish but fast VRAM and great compute, augment a Mac with large but slow VRAM and weak compute? The interconnect isn’t powerful enough to change layers on the GPU rapidly, I guess? | ||||||||||||||||||||||||||
| ▲ | zozbot234 17 hours ago | parent | next [-] | |||||||||||||||||||||||||
> But how will a GPU with small-ish but fast VRAM and great compute, augment a Mac with large but slow VRAM and weak compute? It would work just like a discrete GPU when doing CPU+GPU inference: you'd run a few shared layers on the discrete GPU and place the rest in unified memory. You'd want to minimize CPU/GPU transfers even more than usual, since a Thunderbolt connection only gives you equivalent throughput to PCIe 4.0 x4. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | arjie 17 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
My Mini is actually the smallest model so it actually has "small but slow VRAM" (haha!) so the reason I want the GPU for are the smaller Gemmas or Qwens. Realistically, I'll probably run on an RTX 6000 Pro but this might be fun for home. | ||||||||||||||||||||||||||
| ▲ | GeekyBear 17 hours ago | parent | prev [-] | |||||||||||||||||||||||||
We've seen many recent projects to stream models direct from SSD to a discrete GPU's limited VRAM on PCs. How big a bottleneck is Thunderbolt 5 compared to an SSD? Is the 120 Gbps mode only available when linked to a monitor? | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||