Remix.run Logo
beastman82 7 hours ago

FWIW I'm running gemma4 31b on my 5090 and it's pretty great as well.

QAT, MTP, 128k context.

I liked Qwen 3.6 27b too, it just seems that Gemma4 is a bit underrated.

kofu 7 hours ago | parent | next [-]

My experience also aligns with this. I'm running gemma4 31B on a 4090 through llm.cpp with unsloth models. I also run Qwen 3.6. Qwen is good for thinking and planning as it is faster, but Gemma4's generated code is much higher quality in the first try (Rust, C++ and C#). so it needs less revisions to be at a level I'm comfortable for merging.

beastman82 6 hours ago | parent | next [-]

I second unsloth models. I'm using them over blackwell-oriented nvfp4 models as they are (empirically) top quality and performance.

kroaton 2 hours ago | parent [-]

NVFP4 will be better if the model provider actually post-trained properly after quantizing.

6 hours ago | parent | prev [-]
[deleted]
nozzlegear 5 hours ago | parent | prev | next [-]

I can't Gemma4 to actually finish a turn properly, it's always ending abruptly or making malformed tool calls. It's probably something I've misconfigured in oMLX or Opencode.

clusterhacks 4 hours ago | parent [-]

Huh. Same problem, and I run with llama.cpp. In my case, Gemma4-31B (4-bit quant though) will just stop sometimes.

accrual 7 hours ago | parent | prev [-]

Nice. I flip flop between Qwen 3.5 9B Q6_M and Gemma4 12B Q4_K_M on a 4080 Super. They run at about the same speed and I can have them review each other's plan or diffs. For smaller projects I find them very capable, and I can step up to a better quant for slightly more challenging work.

nok22kon 6 hours ago | parent [-]

you can probably run Gemma4 26B on your card also at 4 bit. World of a difference compared with 12B.

zingar 5 hours ago | parent [-]

Where does “big model highly quantized” start getting worse than “smaller model less quantized”? Is there a general formula or is it just trial and error?

nok22kon an hour ago | parent [-]

paper is a bit old, but matches current empirical recommandation: a good starting point is the biggest model you can fit at 4 bit

https://arxiv.org/abs/2212.09720