Remix.run Logo
pplonski86 8 hours ago

There are so many models, is there any website with list of all of them and comparison of performance on different tasks?

Reubend 8 hours ago | parent | next [-]

The post actually has great benchmark tables inside of it. They might be outdated in a few months, but for now, it gives you a great summary. Seems like Gemini wins on image and video perf, Claude is the best at coding, ChatGPT is the best for general knowledge.

But ultimately, you need to try them yourself on the tasks you care about and just see. My personal experience is that right now, Gemini Pro performs the best at everything I throw at it. I think it's superior to Claude and all of the OSS models by a small margin, even for things like coding.

Imustaskforhelp 8 hours ago | parent [-]

I like Gemini Pro's UI over Claude so much but honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.

wobfan 6 hours ago | parent [-]

> honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.

Me too!

> I like Gemini Pro's UI over Claude so much

This I don't understand. I mean, I don't see a lot of difference in both UIs. Quite the opposite, apart from some animations, round corners and color gradings, they seem to look very alike, no?

Imustaskforhelp 4 hours ago | parent [-]

Y'know I ended up buying Kimi's moderato plan which is 19$ but they had this unique idea where you can talk to a bot and they could reduce the price

I made it reduce the price of first month to 1.49$ (It could go to 0.99$ and my frugal mind wanted it haha but I just couldn't have it do that lol)

Anyways, afterwards for privacy purposes/( I am a minor so don't have a card), ended up going to g2a to get a 10$ Visa gift card essentially and used it. (I had to pay a 1$ extra but sure)

Installed kimi code on my mac and trying it out. Honestly, I am kind of liking it.

My internal benchmark is creating pomodoro apps in golang web... Gemini 3 pro has nailed it, I just tried the kimi version and it does have some bugs but it feels like it added more features.

Gonna have to try it out for a month.

I mean I just wish it was this cheap for the whole year :< (As I could then move from, say using the completely free models)

Gonna have to try it out more!

coffeeri 8 hours ago | parent | prev [-]

There is https://artificialanalysis.ai

XCSme 3 hours ago | parent | next [-]

There are many lists, but I find all of them outdated or containing wrong information or missing the actual benchmarks I'm looking for.

I was thinking, that maybe it's better to make my own benchmarks with the questions/things I'm interested in, and whenever a new model comes out run those tests with that model using open-router.

pplonski86 8 hours ago | parent | prev [-]

Thank you! Exactly what I was looking for