Remix.run Logo
Aurornis 2 hours ago

This comes up so frequent that I’ve seen at least 3-4 different websites running daily benchmarks on providers and plotting their performance.

The last one I bookmarked has already disappeared. I think they’re generally vibe coded by developers who think they’re going to prove something but then realize it’s expensive to spend that money on tokens every day.

They also use limited subsets of big benchmarks because to keep costs down, which increases the noise of the results. The last time someone linked to one of the sites claiming a decline in quality looked like a noisy mostly flat graph that someone had put a regression line on that was very slightly sloping downward.