Remix.run Logo
cyanydeez 6 hours ago

I'm curios: are you doing a real apples to apples comparison, or are you running a harness that already curates prompts? There's a far and wide margin how any of these models respond based on already loaded context. Most models are pretty much hot garbage until their context is curated appropiately.

spijdar 6 hours ago | parent [-]

I just copied and pasted each prompt as specified by Mashimo and simonw into a chat interface, using a 4-bit Unsloth quantization of Gemma 4 26B, with the default sampler settings recommended by Google, and a system prompt of "You are a helpful assistant". The results are miles ahead of what the Mistral model output.

I've gotten a lot of use out of Mistral models, and I imagine this model is pretty good at other things, but it really feels like a 128B parameter dense model should be at least a little better than this.