Remix.run Logo
Jgoauh 7 hours ago

have you tried https://artificialanalysis.ai/

JimDugan 6 hours ago | parent [-]

Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.

IgorPartola 6 hours ago | parent [-]

Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?