Remix clone Hacker News

new | show | ask | jobs Github

	▲	UncleOxidant 4 hours ago
		Not who you asked, but I've got a Framework desktop (strix halo) with 128GB RAM. In linux up to about 112GB can be allocated towards the GPU. I can run Qwen3.5-122B (4-bit quant) quite easily on this box. I find qwen3-coder-next (80b param, MOE) runs quite well at about 36tok/sec. Qwen3.5-27b is a bit slower at about ~24tok/sec but that's a dense model.