▲ | benlivengood 6 hours ago | |||||||||||||
METR [0] explicitly measures the progress on long term tasks; it's as steep a sigmoid as the other progress at the moment with no inflection yet. As others have pointed out in other threads RLHF has progressed beyond next-token prediction and modern models are modeling concepts [1]. [0] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com... [1] https://www.anthropic.com/news/tracing-thoughts-language-mod... | ||||||||||||||
▲ | Vegenoid 5 hours ago | parent | next [-] | |||||||||||||
At the risk of coming off like a dolt and being super incorrect: I don't put much stock into these metrics when it comes to predicting AGI. Even if the trend of "length of task an AI can reliably do doubles every 7 months" continues, as they say that means we're years away from AI that can complete tasks that take humans weeks or months. I'm skeptical that the doubling trend will continue into that timescale, I think there is a qualitative difference between tasks that take weeks or months and tasks that take minutes or hours, a difference that is not reflected by simple quantity. I think many people responsible for hiring engineers are keenly aware of this distinction, because of their experience attempting to choose good engineers based on how they perform in task-driven technical interviews that last only hours. Intelligence as humans have it seems like a "know it when you see it" thing to me, and metrics that attempt to define and compare it will always be looking at only a narrow slice of the whole picture. To put it simply, the gut feeling I get based on my interactions with current AI, and how it is has developed over the past couple of years, is that AI is missing key elements of general intelligence at its core. While there's more lots more room for its current approaches to get better, I think there will be something different needed for AGI. I'm not an expert, just a human. | ||||||||||||||
| ||||||||||||||
▲ | Fraterkes 6 hours ago | parent | prev [-] | |||||||||||||
The METR graph proposes a 6 year trend, based largely on 4 datapoints before 2024. I get that it is hard to do analyses since were in uncharted territory, and I personally find a lot of the AI stuff impressive, but this just doesn't strike me as great statistics. | ||||||||||||||
|