| ▲ | bildung 8 hours ago | |
I currently run the qwen3.5-122B (Q4) on a Strix Halo (Bosgame M5) and am pretty happy with it. Obviously much slower than hosted models. I get ~ 20t/s with empty context and am down to about 14t/s with 100k of context filled. No tuning at all, just apt install rocm and rebuilding llama.cpp every week or so. | ||