Remix clone Hacker News

new | show | ask | jobs Github

	▲	bildung 8 hours ago
		I currently run the qwen3.5-122B (Q4) on a Strix Halo (Bosgame M5) and am pretty happy with it. Obviously much slower than hosted models. I get ~ 20t/s with empty context and am down to about 14t/s with 100k of context filled. No tuning at all, just apt install rocm and rebuilding llama.cpp every week or so.