Remix clone Hacker News

new | show | ask | jobs Github

	▲	vunderba 2 hours ago
		That arena leaderboard has some questionable results. Anyone who's used these models would know that ranking HiDream above Krea2 is a pretty hot take. Many of these ELO comparative tests (ArtificialAnalysis is guilty as hell on this as well) also have other problems such as a considerable number of "amateur judges" tending to prioritize aesthetics over actual instruction-following given the prompt. Also (less a critique of Arena.AI necessarily), but the MAI models are so incredibly locked down (e.g. censored) as to be functionally useless. I have a sneaking suspicion its fallout from Tay. https://en.wikipedia.org/wiki/Tay_(chatbot)