| ▲ | InkCanon 8 months ago | ||||||||||||||||
>4.1 Was better in 55% of cases Um, isn't that just a fancy way of saying it is slightly better >Score of 6.81 against 6.66 So very slightly better | |||||||||||||||||
| ▲ | wiz21c 8 months ago | parent | next [-] | ||||||||||||||||
"they found that GPT‑4.1 excels at both precision..." They didn't say it is better than Claude at precision etc. Just that it excels. Unfortunately, AI has still not concluded that manipulations by the marketing dept is a plague... | |||||||||||||||||
| ▲ | kevmo314 8 months ago | parent | prev | next [-] | ||||||||||||||||
A great way to upsell 2% better! I should start doing that. | |||||||||||||||||
| |||||||||||||||||
| ▲ | marsh_mellow 8 months ago | parent | prev [-] | ||||||||||||||||
I don't think the absolute score means much — judge models have a tendency to score around 7/10 lol 55% vs. 45% equates to about a 36 point difference in ELO. in chess that would be two players in the same league but one with a clear edge | |||||||||||||||||
| |||||||||||||||||