| ▲ | onlyrealcuzzo 2 hours ago | |||||||
I just tested this on a bug fixing benchmark I'm working on. It did not perform as well as I expected. Qwen2.5-Coder-3B (2 years old) outperformed it by a wide range -> fixing ~50% of bugs whereas this model only fixed ~12%. Granted, it's not a coder specific model, but given its benchmark performance to Gemma models, and that it's two years newer, and that it's an MoE with 8B total params, I expected it to be more competitive. | ||||||||
| ▲ | XCSme 44 minutes ago | parent | next [-] | |||||||
I will test it when it's accessible via OpenRouter, but the previous LFM2 model (lfm-2-24b-a2b) didn't do well on my tests, it got only 1/20 questions/tasks right, way below Gemma 31B or Qwen 35b-a3b (those get like 10/20 right) | ||||||||
| ▲ | debazel an hour ago | parent | prev | next [-] | |||||||
I tried it with OpenCode and it is borderline incapable of using tool calls, so that might be why it is doing so bad on your test. | ||||||||
| ||||||||
| ▲ | HanClinto 2 hours ago | parent | prev [-] | |||||||
Some of the coding-specific fine-tunes were really impressive boosts. Qwen2.5-3B-Instruct is also available [0] -- if it's not too much to ask, I'd be curious how more general models stack up in your benchmark? | ||||||||