| ▲ | UncleOxidant 4 hours ago | |
Not who you asked, but I've got a Framework desktop (strix halo) with 128GB RAM. In linux up to about 112GB can be allocated towards the GPU. I can run Qwen3.5-122B (4-bit quant) quite easily on this box. I find qwen3-coder-next (80b param, MOE) runs quite well at about 36tok/sec. Qwen3.5-27b is a bit slower at about ~24tok/sec but that's a dense model. | ||