> It's incredibly difficult to compress information without have at least some internal model of that information. Whether that model is a "world model" that fits the definition of folks like Sutton and LeCunn is semantic.

Sutton's emphasizes his point by saying is that LLMs trying to reach AGI is futile because their world models are less capable that a squirrel's, in part because the squirrel has direct experiences and its own goals, and is capable of continual learning based on those in real time, whereas an LLM has none of those.

Finally he says if you could recreate the intelligence of a squirrel you'd be most of the way toward AGI, but you can't do that with an LLM.

▲

LarsDu88 14 hours ago | parent | next [-]

This is actually a pretty good point, but quite honestly isn't this just an implementation detail? We can wire up a squirrel robot, give it a wifi connection to a Cerebras inference engine with a big context window, then let it run about during the day collecting a video feed while directing it to do "squirrel stuff".

Then during the night, we make it go to sleep and use the data collected during the day to continue finetuning the actual model weights in some data center somewhere.

After 2 years, this model would have a ton of "direct experiences" about the world.

	▲	danans 7 hours ago \| parent [-]
		> then let it run about during the day collecting a video feed while directing it to do "squirrel stuff". Your phrase "squirrel stuff" is doing a lot of work. What are the robo-squirrels "goals" and how does it relate to the physical robot? Is it going around trying to find spare electronic parts to repair itself and reproduce? How does the video feed data relate to its goals? Where do these goals come from? Despite all their expensive training, LLMs do not emerge goals. Why would they emerge for your robot squirrel, especially when the survival of its brain is not dependent on the survival of its mechanical body.

▲

ninetyninenine 15 hours ago | parent | prev [-]

Except Sutton has no idea or even a clue about the internal model of a squirrel. He just uses it as a symbol for utterly stupid but still smarter than an LLM. It’s semantic manipulation in attempt to prove his point but he proves nothing.

We have no idea how much of the world a squirrel understands. We understand LLMs more than squirrels. Arguably we don’t know if LLMs are more intelligent than squirrels.

> Finally he says if you could recreate the intelligence of a squirrel you'd be most of the way toward AGI, but you can't do that with an LLM.

Again he doesn’t even have a quantitative baseline for what intelligence means for a squirrel and how intelligent a squirrel is compared to an LLM. We literally have no idea if LLMs are more intelligent or less and no direct means of comparing what is more or less an apple and an orange.

▲

danans 15 hours ago | parent [-]

> We have no idea how much of the world I squirrel understands. We understand LLMs more than squirrels

Based on our understanding of biology and evolution we know that a squirrel brain works more similarly to the way we humans do vs an LLM.

To the extent we understand LLMs, it's because they are strictly less complex than both ours and squirrels' brains, not because they are better model for our intelligence. They are a thin simulation of human language generation capability mediated via text.

We also see that a squirrel, like us, is capable of continuous learning driven by its own goals, all on an energy budget many orders of magnitude lower than LLMs. That last part is a strong empirical indication that suggests that LLMs are a dead end for AGI, given that the real world employs harsh energy constraints on biological intelligences.

Also remember that Sutton is still of an AI maximalist. He isn't saying that AGI isn't possible, just that LLMs can't get us there.

▲

LarsDu88 13 hours ago | parent | next [-]

I don't think a modern LLM is necessarily less complicated than a squirrel brain. If anything it's more engineered (well structured and dissectable), but loaded with tons of erroneous circuitry that is completely irrelevant for intelligence.

The squirrel brain is an analogue mostly hardcoded circuit. It can take about one synapse to represent each "weight". A synapse is just a bit of fat membrane with some ion channels stuck on the surface.

A flip flop to represent a bit takes about 6 transistors, but in a typical modern GPU is going to need way more transitors to wire that bit - at least 20-30. multiply that by the minimum amount of bits to represent a single NN weight and you're looking at about 200-300 transitors just to represent one NN param for computing

And that's for actual compute. The actual weights in a GPU are stored most of the time in DRAM which needs to be constantly shuttled back and forth between the GPU's SRAM and HBM DRAM.

300 transistors with memory shuttling overhead versus a bit of fat membrane, and it's obvious general purpose GPU compute has a huge energy and compute overhead.

In the future, all 300 could conceivably replaced with a single crossbar latch in the form of a memristor.

▲

ninetyninenine 14 hours ago | parent | prev [-]

> Based on our understanding of biology and evolution we know that a squirrel understands its world more similarly to the way we do than an LLM.

Bro. Evolution is random walk. That means most of the changes are random and arbitrary based on whatever allows the squirrel to survive.

We know squirrels and humans diverged from a common ancestor but we do not know how much has changed since the common ancestor and we do not know what changed and we do not know the baseline for what this common ancestor is.

Additionally we don’t even understand the current baseline. We have no idea how brains work. if we did we would be able to build a human brain but as of right now LLMs are the closest model we have ever created to something that simulates or is remotely similar to the brain.

So your fuzzy qualitative statement of we understand evolution and biology is baseless. We don’t understand shit.

> We also see that a squirrel, like us, is capable of continuous learning driven by its own goals, all on an energy budget many orders of magnitude lower. That last part is a strong empirical indication that suggests that LLMs are a dead end for AGI.

So an LLM cant continuously learn? You realize that LLMs are deployed agentically all the time now so they both continuously learn and follow goals? Right? You’re aware of this i hope.

The energy efficiency is a byproduct of hardware. The theory of LLMs and machine learning is independent from the flawed silicon technology that is causing the energy efficiencies. Like how a computer can be made mechanical an LLM can be as well. The LLM is independent of the actual implementation and energy inefficiencies. This is not at all a strong empirical indication that LLMs are a dead end. It’s a strong indication that your thinking is illogical and flawed.

> Also remember that Sutton is still of an AI maximalist. He isn't saying that AGI isn't possible, just that LLMs can't get us there.

He can’t say any of this because he doesn’t actually know. None of us know for sure. We literally don’t know why LLMs work. The fact that training transformers on massive amounts of data produced this level of intelligence was a total surprise for all the experts and we still have no idea why this stuff works. His statements are too overarching and glossing over a lot of things we don’t actually know.

Yann lecuun for example called LLMs stochastic parrots. We now know this is largely incorrect. The reason Yan can be so wrong is because nobody actually knows shit.

	▲	danans 14 hours ago \| parent [-]
		> Bro. Evolution is random walk. That means most of the changes are random and arbitrary based on whatever allows the squirrel to survive. For the vast majority of evolutionary history, very similar forces have shaped us and squirrels. The mutations are random, but the selections are not. If squirrels are a stretch for you, take the closest human relative: chimpanzees. There is a very reasonable hypothesis that their brains work very similarly to ours, far more similarly than ours to an LLM. > So an LLM cant continuously learn? You realize that LLMs are deployed agentically all the time now so they both continuously learn and follow goals? That is not continuous learning. The network does not retrain through that process. It's all in the agent's context. The agent has no intrinsic goals nor ability to develop them. It merely samples based on it's prior training and it's current content. It doesn't retrain through this process. Biological intelligence does retrain constantly. > The energy efficiency is a byproduct of hardware. The theory of LLMs and machine learning is independent from the flawed silicon technology that is causing the energy efficiencies. There is no evidence to support that a transformer model's inefficiency is hardware based. There is direct evidence to support that the inefficiency is influenced by the fact that LLM inference and training are both auto-regressive. Auto-regression maps to compute cycles maps to energy consumption. That's a problem with the algorithm, not the hardware. > The fact that training transformers on massive amounts of data produced this level of intelligence was a total surprise for all the experts The level of intelligence produced is only impressive compared to the prior state of the art, and at its impressive modeling the narrow band of intelligence represented by encoded language (not all language) produced by humans. In most every other aspect of intelligence - notably continuous learning driven by intrinsic goals - LLMs fail.