| ▲ | _boffin_ 2 days ago | |||||||||||||
What was the main focus when training this model? Besides the ELO score, it's looking like the models (31B / 26B-A4) are underperforming on some of the typical benchmarks by a wide margin. Do you believe there's an issue with the tests or the results are misleading (such as comparative models benchmaxxing)? Thank you for the release. | ||||||||||||||
| ▲ | BoorishBears 2 days ago | parent [-] | |||||||||||||
Becnhmarks are a pox on LLMs. You can use this model for about 5 seconds and realize its reasoning is in a league well above any Qwen model, but instead people assume benchmarks that are openly getting used for training are still relevant. | ||||||||||||||
| ||||||||||||||