Remix.run Logo
iLoveOncall a day ago

Hasn't it been proven many times that all those companies cheat on benchmarks?

I personally couldn't care less about them, especially when we've seen many times that the public's perception is absolutely not tied to the benchmarks (Llama 4, the recent OpenAI model that flopped, etc.).

sebzim4500 a day ago | parent [-]

I don't think there's any real evidence that any of the major companies are going out of their way to cheat the benchmarks. Problem is that, unless you put a lot of effort into avoiding contamination, you will inevitably end up with details about the benchmark in the training set.