Remix clone Hacker News

new | show | ask | jobs Github

	▲	aurareturn 2 days ago
		$7.2k just to run at best Qwen3.5-35B-A3B doesn't seem worth it at all. This is certainly not the most effective use of $7k for running local LLMs. The answer is a 16" M5 Max 128GB for $5k. You can run much bigger models than your setup while being an awesome portable machine for everything else.
	▲	emidoots 2 days ago \| parent [-]
		Performance (tok/s and PP) or quality (model size)? Pick one. In terms of GPU memory bandwidth (models fitting in the ~48GB of RTX 5000 Pro card), the RTX card I described above has over 2x the bandwidth of an M5 Max. If leveraging system RAM (the 128GB-256GB outside the GPU) to run larger models, then the memory bandwidth is ~6x slower than M5 Max. For models fitting in the ~48GB RTX memory, like dense Qwen3.5 27B models, the RTX will be 2-4x faster than M5 Max. For models that don't fit in the 48GB RTX memory, the M5 Max will be 5-20x faster. Also worth considering future upgrades: Do you plan to throw away the machine in a few years, or pick up multiple used RTX 6000 Pro cards when people start ditching them?