Remix.run Logo
frontsideair 3 days ago

14B Qwen was a good choice, but it became outdated a bit and seems like the new version of 4B surpassed it in benchmarks somehow.

It's a balancing game, how slow a token generation speed can you tolerate? Would you rather get an answer quick, or wait for a few seconds (or sometimes minutes) for reasoning?

For quick answers, Gemma 3 12B is still good. GPT-OSS 20B is pretty quick when reasoning is set to low, which usually doesn't think longer than one sentence. I haven't gotten much use out of Qwen3 4B Thinking (2507) but at least it's fast while reasoning.