| ▲ | dennemark 7 hours ago | ||||||||||||||||||||||
I have two Strix Halo devices at hand. Privately a framework desktop with 128gb and at work 64GB HP notebook. The 64GB machine can load Qwen3.5 30B-A3B, with VSCode it needs a bit of initial prompt processing to initialize all those tools I guess. But the model is fighting with the other resources that I need. So I am not really using it anymore these days, but I want to experiment on my home machine with it. I just dont work on it much right now. Lemonade has a Web UI to set the context size and llama.cpp args, you need to set context to proper number or just to 0 so that it uses the default. If its too low, it wont work with agentic coding. I will try some Claw app, but first need to research the field a bit. But I am using different models on Open Web UI. GPT 120B is fast, but also Qwen3.5 27B is fine. | |||||||||||||||||||||||
| ▲ | cpburns2009 7 hours ago | parent [-] | ||||||||||||||||||||||
Qwen3-Coder-Next works well on my 128GB Framework Desktop. It seems better at coding Python than Qwen3.5 35B-A3B, and it's not too much slower (43 tg/s compared to 55 tg/s at Q4). 27B is supposed to be really good but it's so slow I gave up on it (11-12 tg/s at Q4). | |||||||||||||||||||||||
| |||||||||||||||||||||||