Remix.run Logo
guptadagger 3 days ago

Speaking of ChatGPT getting worse over time, it would be interesting to see ChatGPT be benchmarked continuously to see how it performs over time (and the results published somewhere publically).

Even local variations would be interesting

arnaudsm 3 days ago | parent [-]

https://livebench.ai/ does that, the latest gpt4o underperforms previous versions significantly