Remix.run Logo
IgorPartola 8 hours ago

I wish there was some easy resource to keep up with the latest models. The best I have come up with so far is asking one model to research the others. Realistically I want to know latest versions, best use case, performance (in terms of speed) relative to some baseline, and hardware requirements to run it.

__mharrison__ 5 hours ago | parent | next [-]

I use Aider heavily and find their benchmark to be pretty good. It is updated relatively frequently (a month ago, which may be an eternity in AI time).

https://aider.chat/docs/leaderboards/

Jgoauh 7 hours ago | parent | prev | next [-]

have you tried https://artificialanalysis.ai/

JimDugan 6 hours ago | parent [-]

Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.

IgorPartola 6 hours ago | parent [-]

Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?

exe34 8 hours ago | parent | prev [-]

> asking one model to research the others.

that's basically choosing are random with extra steps!

throwup238 7 hours ago | parent [-]

Research not spit out the answer based on weights. Just ask Gemini/Claude to do deep research on /r/LocalLLama and HN posts.