| ▲ | minimaxir 2 days ago | |
I can't find what ELO score specifically the benchmark chart is referring to, it's just labeled "Elo Score". It's not Codeforces ELO as that Gemma 4 31B has 2150 for that which would be off the given chart. | ||
| ▲ | nabakin 2 days ago | parent [-] | |
It's referring to the Lmsys Leaderboard/Lmarena/Arena.ai[0]. It's very well-known in the LLM community for being one of the few sources of human evaluation data. | ||