| ▲ | nabakin 2 days ago | |||||||||||||||||||||||||||||||||||||
Public benchmarks can be trivially faked. Lmarena is a bit harder to fake and is human-evaluated. I agree it's misleading for them to hyper-focus on one metric, but public benchmarks are far from the only thing that matters. I place more weight on Lmarena scores and private benchmarks. | ||||||||||||||||||||||||||||||||||||||
| ▲ | nl a day ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
Concentrating on LMAreana cost Meta many hundreds of billions of dollar and lots of people their jobs with the Lllama4 disaster. | ||||||||||||||||||||||||||||||||||||||
| ▲ | moffkalast 2 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
Lm arena is so easy to game that it's ceased to be a relevant metric over a year ago. People are not usable validators beyond "yeah that looks good to me", nobody checks if the facts are correct or not. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||