It will run (somewhat slowly) on a five year old M1 Max with 64GB RAM.

Personally I prefer the 35B MoE model, which is fast enough to be interactively useful, and capable, but I would probably use the 27B if I wanted to generate whole applications like that.

I am unconvinced that most "local" AI applications need anything much more powerful than the Gemma 4 12B model. Local agentic coding is a small niche, but there are plenty of ways a local model can help with development tasks.

I would really like to see a 12B or 16B Qwen 3.6.

I am currently playing with Ornith 1.0 in the MoE configuration, which is based on the 35B variant of Qwen 3.5; I am not sure if it is better than the 3.6 version.

Benchmarks say it is; my own silly tests either suggest otherwise or suggest that I have to talk to it a bit differently.

▲

sleepyeldrazi 7 hours ago | parent [-]

I need to ask, since I have desperately wanted to make Gemma 4 12B work, but im not sure if its the quant (i usually up it to q8, which is a lot higher than iq4_nl that i use for 3.6 27B) or the model itself, but it just starts confusing itself really quickly when I give it coding tasks. And quickly starts failing tool calls.

I really want to have a model that i can run locally on my 24gb m4 pro mbp for when i don't have internet to connect to my 3090 running the qwen, and i love how gemma 4 models 'feel', but i can't make them be competent. I am in the middle of finetuning both qwen3.5 9B and gemma 4 12B just to try and make those bridge closer to 27B for coding/agentic tasks (and am trying to ternarize and DQT 27B so that it fits in ~9gb pre-KV).

How do you run the gemma? What do you use it for (and in what harness), maybe llama.cpp and pi-mono just aren't for this model and that's what i'm doing wrong.

	▲	dofm 6 hours ago \| parent [-]
		It sounds to me like you're further along on this than I am, if you are fine tuning? I am still mostly tinkering/learning rather than spilling out code, and I feel quite slow on it. So it doesn't matter too much to me if it is really slow. More the journey than the destination if that makes sense. I'm stubborn. I have tried the Gemma 4 12B model (Unsloth's QAT version) with search/browse tools in LM Studio and Unsloth Studio, when I am trying to understand a new thing. Basically I get it to write introductory starter documentation for me to absorb, because my big personal problem, these days, is focussing enough to start a project and then digging in; I need the help. I have found its limits on obscure packages (that it sometimes makes up) but before that it's a bit like stumbling on a blog post that happens to be really right for your particular need. Good enough to work through. It's stuff I could ask Perplexity to do, or ChatGPT, to be fair, I just like LM Studio for this and have the inquisitiveness to want to run it locally. In your case: I don't believe it's the quant. I'm sure it's the model — it has good coding knowledge but it's clearly not specialised. It might be good enough at writing Python/PHP/JavaScript at a novice level. It is also quite good on WordPress tooling and functions. But I wouldn't bother with it for agentic coding if you've got experience elsewhere. Might be interesting to see what you can do with the 9B Ornith model? Qwen 3.6 MoE in its Unsloth version is another matter. Impressive and I am trying to find ways to support my old brain doing what I've done before.