Remix.run Logo
jsnell 2 days ago

> GPT-5, Claude, and Gemini represent remarkable achievements, but they’re hitting asymptotes

This part could do with sourcing. I think it seems clearly untrue. We only have three types of benchmark: a) ones that have been saturated, b) ones where AI performance is progressing rapidly, c) really newly introduced ones that were specifically designed for the then-current frontier models to fail on. Look at for example the METR long time horizon task benchmark, which is one that's particularly resistant to saturation.

The entire article is claimed on this unsupported but probably untrue claim, but it's a bit hard to talk about when we don't have any clue about why the author thinks this is true.

> The path to artificial general intelligence isn’t through training ever-larger language models

Then it's a good thing that it's not the path most of the frontier labs are taking. It appears to be what xAI is doing for everything, and it was probably what GPT-4.5 was. Neither is a particularly compelling success story. But all the other progress over the last 12-18 months has come from models the same size or smaller advancing the frontier. And it has come from exactly the kind of engineering improvements that the author claims need to happen, both of the models and the scaffolding around the models. (RL on chain of thought, synthetic data, distillation, model-routing, tool use, subagents).

Sorry, no, they're not exactly the same kind of engineering improvements. They're the kind of engineering improvements that the people actually creating these systems though would be useful and actually worked. We don't see the failed experiments, and we don't see the ideas that weren't well-baked enough to even experiment on.