| ▲ | bigyabai 9 days ago | |||||||
They're honestly not competitive for inference, it's why datacenters largely ignore Apple Silicon. Even the M5 Max is still bottlenecked for dense models due to the relatively weak GPU and paltry ~500-600gb/s of GPU memory bandwidth. For reference, the RTX 5080 (a consumer GPU) has 1tb of VRAM bandwidth and runs circles around the M5 Max in GPU compute benchmarks: https://browser.geekbench.com/opencl-benchmarks Even for home inference, it's hard to recommend a dedicated Mac over a cheap Nvidia server box. > They are probably the only ones that have the talent, resources, and capital to do that. Apple invented OpenCL. The problem was their reluctance to work with the rest of the industry, and once CUDA took over it was too late for them to even try. | ||||||||
| ▲ | seanmcdirmid 9 days ago | parent [-] | |||||||
> For reference, the RTX 5080 (a consumer GPU) has 1tb of VRAM bandwidth and runs circles around the M5 Max in GPU compute benchmarks: https://browser.geekbench.com/opencl-benchmarks NVIDIA hampers their GPUs with un-unified graphics memory, while the M series can use everything the computer has (well, you need to save 4GB or so). It also works on airplanes and in hotel rooms, a cheap NVIDIA server box with 64GB of RAM (what my M3 Max laptop has)....how cheap is that? | ||||||||
| ||||||||