Remix.run Logo
coder543 3 hours ago

It is missing both models that I mentioned, so yes, I would say one reason it is not accurate is because it is so incomplete.

It also doesn't provide error bars on the ELO, so models that only have tens of battles are being listed alongside models that have thousands of battles with no indication of how confident those ELOs are, which I find rather unhelpful.

A lot of these models are also sensitive to how they are used, and offer multiple ways to be used. It's not clear how they are being invoked.

That leaderboard is definitely one of the ones that leaves a lot to be desired.