| ▲ | petercooper 10 days ago | ||||||||||||||||||||||
Its image processing is terrible. I ran several tests against it against Qwen 3.5 0.8b (yes, 7% the size) and Qwen beat it every time with Gemma often getting things entirely wrong. I even gave it a plain image saying "This is a test" and it thought for 6 minutes trying to analyze it and failed. Qwen 3.5 0.8b confidently got it in under a second. It may be that the Q6 quant I got is borked (or my LM Studio is), but either way, the 0.8b's performance is mind boggling in comparison. | |||||||||||||||||||||||
| ▲ | CMay 9 days ago | parent | next [-] | ||||||||||||||||||||||
For Qwen 3.5 0.8B presumably you're running it unquantized, because it's so small. Get at least the Q8 of Gemma 4 12B with the F32 mmproj and use an f16 kv cache. Then run it with the latest llama.cpp that contains the Gemma 4 12B unified bug fixes, using --image-min-tokens 560 --image-max-tokens 2240 --batch-size 4096 --ubatch-size 4096 --temp 1.0 --top-p 0.95 --top-k 64 --jinja It's understanding far more complex things for me and can reliably handle tiny text, so it should be easily understanding an image that only contains the text "This is a test". | |||||||||||||||||||||||
| ▲ | usef- 9 days ago | parent | prev | next [-] | ||||||||||||||||||||||
That sounds like a bug. They're very common for open model releases on the first day. If I wasn't on mobile I'd try it on Google's own app. | |||||||||||||||||||||||
| ▲ | JacobAsmuth 9 days ago | parent | prev | next [-] | ||||||||||||||||||||||
Sounds like you're doing it wrong, to be honest. | |||||||||||||||||||||||
| ▲ | ma2kx 10 days ago | parent | prev | next [-] | ||||||||||||||||||||||
I guess Google implements more / stronger guard rails than Alibaba and thus confuses these small models. At least this was my impression with Gemma3 models where it often said that the image contains some nudity / sex scenes and therefore it cannot give a description of the image. Never understood the point of this behavior.... | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | thot_experiment 10 days ago | parent | prev | next [-] | ||||||||||||||||||||||
I've always found the Gemma models to vastly under-perform on vision tasks compared to Qwen so that's nothing new. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | staticman2 9 days ago | parent | prev [-] | ||||||||||||||||||||||
Test it on a professional inference provider to rule out trouble on your end. | |||||||||||||||||||||||