I also did some SVG tests, it's really bad.

https://chat.mistral.ai/chat/897fbe7d-b1ae-4109-9b29-f3ccc4f...

Wow. I get that "how well can it make SVGs" isn't the (or a) gold standard for how useful a model is or isn't, but the fact the Gemma 4 26B A4B I'm running locally can blow it out of the water doesn't give me high confidence for the model. Maybe an unfair comparison, but...

▲

2ndorderthought 6 hours ago | parent | next [-]

It sounds like they focussed performance on not drawing svgs. Which honestly, makes a lot of sense to me.

	▲	spijdar 6 hours ago \| parent [-]
		Drawing SVGs isn't something I really care about either, and I think it's still to "qualitatively compare" e.g. "Opus's pelican vs GPT's pelican vs GLM's pelican" or whatever the kids are doing. But what stands out to me is that it's barely able to draw a "recognizable" pelican at all. The Devstral 2 model even looks slightly better, though maybe I'm splitting hairs: https://simonwillison.net/2025/Dec/9/

▲

Mashimo 6 hours ago | parent | prev | next [-]

It's so bad I don't want to spend the 18 EUR just to test it for a month. It can't even create an SVG of the facebook logo. There should be plenty of examples of that around.

Gemini fast could do that in under 5 seconds.

▲

cyanydeez 6 hours ago | parent | prev [-]

I'm curios: are you doing a real apples to apples comparison, or are you running a harness that already curates prompts? There's a far and wide margin how any of these models respond based on already loaded context. Most models are pretty much hot garbage until their context is curated appropiately.

	▲	spijdar 6 hours ago \| parent [-]
		I just copied and pasted each prompt as specified by Mashimo and simonw into a chat interface, using a 4-bit Unsloth quantization of Gemma 4 26B, with the default sampler settings recommended by Google, and a system prompt of "You are a helpful assistant". The results are miles ahead of what the Mistral model output. I've gotten a lot of use out of Mistral models, and I imagine this model is pretty good at other things, but it really feels like a 128B parameter dense model should be at least a little better than this.