| ▲ | aurareturn 2 days ago | |
$7.2k just to run at best Qwen3.5-35B-A3B doesn't seem worth it at all. This is certainly not the most effective use of $7k for running local LLMs. The answer is a 16" M5 Max 128GB for $5k. You can run much bigger models than your setup while being an awesome portable machine for everything else. | ||
| ▲ | emidoots 2 days ago | parent [-] | |
Performance (tok/s and PP) or quality (model size)? Pick one. In terms of GPU memory bandwidth (models fitting in the ~48GB of RTX 5000 Pro card), the RTX card I described above has over 2x the bandwidth of an M5 Max. If leveraging system RAM (the 128GB-256GB outside the GPU) to run larger models, then the memory bandwidth is ~6x slower than M5 Max. For models fitting in the ~48GB RTX memory, like dense Qwen3.5 27B models, the RTX will be 2-4x faster than M5 Max. For models that don't fit in the 48GB RTX memory, the M5 Max will be 5-20x faster. Also worth considering future upgrades: Do you plan to throw away the machine in a few years, or pick up multiple used RTX 6000 Pro cards when people start ditching them? | ||