| ▲ | vunderba 2 hours ago | |
That arena leaderboard has some questionable results. Anyone who's used these models would know that ranking HiDream above Krea2 is a pretty hot take. Many of these ELO comparative tests (ArtificialAnalysis is guilty as hell on this as well) also have other problems such as a considerable number of "amateur judges" tending to prioritize aesthetics over actual instruction-following given the prompt. Also (less a critique of Arena.AI necessarily), but the MAI models are so incredibly locked down (e.g. censored) as to be functionally useless. I have a sneaking suspicion its fallout from Tay. | ||