Remix.run Logo
encroach 6 days ago

This outperforms Gemini 3 pro image (nano banana pro) on Text-to-Image Arena and Image Edit Arena. I'm surprised they didn't mention this leaderboard in the blog post.

I like this benchmark because its based upon user votes, so overfitting is not as easy (after all, if users prefer your result, you've won).

https://lmarena.ai/leaderboard/text-to-image

https://lmarena.ai/leaderboard/image-edit

ygouzerh 5 days ago | parent | next [-]

The score are really, really close, it might be why

nycdatasci 6 days ago | parent | prev [-]

The arena concept doesn’t work for image models due to watermarks.

encroach 6 days ago | parent [-]

There are no watermarks in the arena.

nycdatasci 5 days ago | parent [-]

There are no visible watermarks, but model makers can use steganographic codes to identify outputs from their own models.

nycdatasci 5 days ago | parent [-]

Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security

https://arxiv.org/pdf/2510.06525

encroach 5 days ago | parent [-]

This is true, however LMArena does employ some methods to mitigate attempts to manipulate the leaderboard, see https://openreview.net/forum?id=zf9zwCRKyP

They also control for style https://news.lmarena.ai/sentiment-control/