▲ | frontsideair 3 days ago | |
14B Qwen was a good choice, but it became outdated a bit and seems like the new version of 4B surpassed it in benchmarks somehow. It's a balancing game, how slow a token generation speed can you tolerate? Would you rather get an answer quick, or wait for a few seconds (or sometimes minutes) for reasoning? For quick answers, Gemma 3 12B is still good. GPT-OSS 20B is pretty quick when reasoning is set to low, which usually doesn't think longer than one sentence. I haven't gotten much use out of Qwen3 4B Thinking (2507) but at least it's fast while reasoning. |