| ▲ | jwr 22 days ago | ||||||||||||||||
I have my own benchmark: a photo of filament boxes in my drawer. The task is to list (inventory) the filaments. It's not an easy task: the letters are small, the boxes are in various orientations, the labels have lots of writing on them, etc. This is what qwen produces (basically perfect): Fiberlogy (Blue Boxes - Left Side) ASA Black: 750 g (0.75 kg) Easy PLA Graphite: 850 g (0.85 kg) Ref Easy PLA Gray: 850 g (0.85 kg) Ref Easy PLA White: 850 g (0.85 kg) Ref Easy PLA Inox: 850 g (0.85 kg) Bambu Lab (Brown Boxes - Right Side) ASA White: 750 g (0.75 kg) PLA-CF (Burgundy Red): 1,000 g (1 kg) PAHT-CF (Black): 500 g (0.5 kg) PET-CF (Black): 500 g (0.5 kg) PETG-CF (Black): 1,000 g (1 kg) ---------- And this is gemma-4-26b-a4b (basically useless): ### *Fiberlogy Filaments* All Fiberlogy items listed below share these common specifications: Made in Poland, $1.75\text{mm}$ Diameter, $0.75\text{kg}$ Weight, and Print Temperature of $200-230^{\circ}\text{C}$. * *PLA Black* (Barcode: 5903281997986) * *Easy PLA Graphite* (Barcode: 5903281994057) * *Ref Easy PLA Grey* * *Ref Easy PLA White* (Barcode: 5903281994386) * *Ref Easy PLA True...* (Text is cut off) ** ### *Bambu Lab Filaments* All Bambu Lab items listed below share these common specifications: Made in China, Removable Spool (Do Not Remove). * *PLA-CF (Carbon Fiber Reinforced)*
* *PAHT-CF (High Temperature Polyamide with Carbon Fiber)*
* *PETG-CF (Carbon Fiber Reinforced)* | |||||||||||||||||
| ▲ | argee 21 days ago | parent | next [-] | ||||||||||||||||
Thanks. Did you set the image min/max tokens for Gemma4 to 1120 for this? This might not be a fair comparison without that, to the differences in architecture. https://www.reddit.com/r/KoboldAI/comments/1sjnjic/imagemin_... https://github.com/ollama/ollama/issues/15626 I think 1120 vs 280 tokens is a big difference, and you were perhaps using the latter value? | |||||||||||||||||
| |||||||||||||||||
| ▲ | benterix 22 days ago | parent | prev [-] | ||||||||||||||||
Thanks, that's very useful. I find people's small individual tests more important than the usual benchmarks that tend to be gamed by every single lab. | |||||||||||||||||