Remix.run Logo
seanmcdirmid 9 days ago

> For reference, the RTX 5080 (a consumer GPU) has 1tb of VRAM bandwidth and runs circles around the M5 Max in GPU compute benchmarks: https://browser.geekbench.com/opencl-benchmarks

NVIDIA hampers their GPUs with un-unified graphics memory, while the M series can use everything the computer has (well, you need to save 4GB or so). It also works on airplanes and in hotel rooms, a cheap NVIDIA server box with 64GB of RAM (what my M3 Max laptop has)....how cheap is that?

andriy_koval 9 days ago | parent [-]

I think un-unified memory issue is solved by software layer in datacenter setting: model is distributed across multiple GPUs in the same server, or across multiple servers if model is extra large.