| ▲ | BoorishBears 2 days ago | |
Becnhmarks are a pox on LLMs. You can use this model for about 5 seconds and realize its reasoning is in a league well above any Qwen model, but instead people assume benchmarks that are openly getting used for training are still relevant. | ||
| ▲ | girvo a day ago | parent | next [-] | |
They really are. Benchmaxxing is real… but also the Qwen 3.5 series of models are still very impressive. I’m looking forward to trying out Gemma | ||
| ▲ | j45 2 days ago | parent | prev [-] | |
Definitely have to use each model for your use case personally, many models can train to perform better on these tests but that might not transfer to your use case. | ||