See: https://lmarena.ai/leaderboard
Unless you overfit to benchmark style scenarios and are worse for real-world use.