Remix clone Hacker News

new | show | ask | jobs Github

	▲	AndrewDucker 7 days ago
		My personal definition is "The ability to form models from observations and extrapolate from them." LLMs are great at forming models of language from observations of language and extrapolating language constructs from them. But to get general intelligence we're going to have to let an AI build their models from direct measurements of reality.
	▲	daveguy 7 days ago \| parent [-]
		> LLMs are great at forming models of language They really aren't even great at forming models of language. They are a single model of language. They don't build models, much less use those models. See, for example, ARC-AGI 1 and 2. They only performed ARC 1 decently [0] with additional training, and are failing miserably on ARC 2. That's not even getting to ARC 3. [0] https://arcprize.org/blog/oai-o3-pub-breakthrough > Note on "tuned": OpenAI shared they trained the o3 we tested on 75% of the Public Training set. They have not shared more details. We have not yet tested the ARC-untrained model to understand how much of the performance is due to ARC-AGI data. ... Clearly not able to reason about the problems without additional training. And no indication that the additional training didn't include some feature extraction, scaffolding, RLHF, etc created by human intelligence. Impressive that fine tuning can get >85%, but it's still additional human directed training and not self contained intelligence at the level of performance reported. The blog was very generous making the undefined "fine tuning" a footnote and praising the results as if they were directly from the model that would have cost > $65,000 to run. Edit: to be clear, I understand LLMs are a huge leap forward in AI research and possibly the first models that can provide useful results across multiple domains without being retrained. But they're still not creating their own models, even of language.