Remix.run Logo
Aurornis 7 hours ago

I've also been using the Qwen3.5-27B and the new Qwen3.6 locally, both at Q6. I don't agree that they're as good as pre-Opus Claude. I really like how much they can do on my local hardware, but we have a long way to go before we reach parity with even the pre-Opus Claude in my opinion.

wizee 4 hours ago | parent | next [-]

I run Qwen 3.5 122B-A10B on my MacBook Pro, and in my experience its capability level for programming and code comprehension tasks is roughly that of Claude Sonnet 3.7. Honestly I find that pretty amazing, having something with capability roughly equivalent to frontier models of an year ago running locally on my laptop for free. I’m eager to try Qwen 3.6 122B-A10B when it’s released.

_fizz_buzz_ 5 hours ago | parent | prev [-]

What hardware do you use? I want to experiment with running models locally.

threecheese 4 hours ago | parent | next [-]

OP’s Qwen3.6 27B Q6 seems to run north of 20GB on huggingface, and should function on an Apple Silicon with 32GB RAM. Smaller models work unreasonably well even on my M1/64GB MacBook.

I am getting 10tok/sec on a 27B of Qwen3.5 (thinking, Q4, 18GB) on an M4/32GB Mac Mini. It’s slow.

For a 9B (much smaller, non-thinking) I am getting 30tok/sec, which is fast enough for regular use if you need something from the training data (like how to use grep or Hemingways favorite cocktail).

I’m using LMStudio, which is very easy and free (beer).

UncleOxidant 4 hours ago | parent | prev [-]

Not who you asked, but I've got a Framework desktop (strix halo) with 128GB RAM. In linux up to about 112GB can be allocated towards the GPU. I can run Qwen3.5-122B (4-bit quant) quite easily on this box. I find qwen3-coder-next (80b param, MOE) runs quite well at about 36tok/sec. Qwen3.5-27b is a bit slower at about ~24tok/sec but that's a dense model.