Remix.run Logo
Herring 11 hours ago

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

He’s wrong we still scaling, boys.

rockinghigh 11 hours ago | parent | next [-]

You should read the transcript. He's including 2025 in the age of scaling.

> Maybe here’s another way to put it. Up until 2020, from 2012 to 2020, it was the age of research. Now, from 2020 to 2025, it was the age of scaling—maybe plus or minus, let’s add error bars to those years—because people say, “This is amazing. You’ve got to scale more. Keep scaling.” The one word: scaling.

> But now the scale is so big. Is the belief really, “Oh, it’s so big, but if you had 100x more, everything would be so different?” It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don’t think that’s true. So it’s back to the age of research again, just with big computers.

Herring 11 hours ago | parent [-]

Nope, Epoch.ai thinks we have enough to scale till 2030 at least. https://epoch.ai/blog/can-ai-scaling-continue-through-2030

^

/_\

***

mindwok 7 hours ago | parent | next [-]

That article is more about feasibility rather than desirability. There's even a section where they say:

> Settling the question of whether companies or governments will be ready to invest upwards of tens of billions of dollars in large scale training runs is ultimately outside the scope of this article.

Ilya is saying it's unlikely to be desirable, not that it isn't feasible.

10 hours ago | parent | prev | next [-]
[deleted]
techblueberry 7 hours ago | parent | prev | next [-]

Wait, nope because someone disagrees?

imiric 10 hours ago | parent | prev [-]

That article is from August 2024. A lot has changed since then.

Specifically, performance of SOTA models has been reaching a plateau on all popular benchmarks, and this has been especially evident in 2025. This is why every major model announcement shows comparisons relative to other models, but not a historical graph of performance over time. Regardless, benchmarks are far from being a reliable measurement of the capabilities of these tools, and they will continue to be reinvented and gamed, but the asymptote is showing even on their own benchmarks.

We can certainly continue to throw more compute at the problem. But the point is that scaling the current generation of tech will continue to have fewer returns.

To make up for this, "AI" companies are now focusing on engineering. 2025 has been the year of MCP, "agents", "skills", etc., which will continue in 2026. This is a good thing, as these tools need better engineering around them, so they can deliver actual value. But the hype train is running out of steam, and unless there is a significant breakthrough soon, I suspect that next year will be a turning point in this hype cycle.

ojbyrne 7 hours ago | parent [-]

I’m curious how you deduced it’s from 2024. Timestamps on the article and the embedded video are both November 2025.

rdedev 10 hours ago | parent | prev | next [-]

The 3rd graph is interesting. Once the model performance reaches above human baseline, the growth seems to be logarithmic instead of exponential.

epistasis 11 hours ago | parent | prev | next [-]

That blog post is eight months old. That feels like pretty old news in the age of AI. Has it held since then?

conception 11 hours ago | parent | next [-]

It looks like it’s been updated as it has codex 5.1 max on it

11 hours ago | parent | prev [-]
[deleted]
an0malous 7 hours ago | parent | prev [-]

“Time it takes for a human to complete a task that AI can complete 50% of the time” seems like a really contrived metric. Suppose it takes 30 minutes to write code to scrape a page and also 30 minutes to identify a bug in a SQL query, an AI’s ability to solve the former has virtually no bearing on its ability to solve the latter but we’re considering them all in the same set of “30 minute problems.” Where do they get the data for task durations anyway?