> We have no idea how much of the world I squirrel understands. We understand LLMs more than squirrels

Based on our understanding of biology and evolution we know that a squirrel brain works more similarly to the way we humans do vs an LLM.

To the extent we understand LLMs, it's because they are strictly less complex than both ours and squirrels' brains, not because they are better model for our intelligence. They are a thin simulation of human language generation capability mediated via text.

We also see that a squirrel, like us, is capable of continuous learning driven by its own goals, all on an energy budget many orders of magnitude lower than LLMs. That last part is a strong empirical indication that suggests that LLMs are a dead end for AGI, given that the real world employs harsh energy constraints on biological intelligences.

Also remember that Sutton is still of an AI maximalist. He isn't saying that AGI isn't possible, just that LLMs can't get us there.

▲

LarsDu88 13 hours ago | parent | next [-]

I don't think a modern LLM is necessarily less complicated than a squirrel brain. If anything it's more engineered (well structured and dissectable), but loaded with tons of erroneous circuitry that is completely irrelevant for intelligence.

The squirrel brain is an analogue mostly hardcoded circuit. It can take about one synapse to represent each "weight". A synapse is just a bit of fat membrane with some ion channels stuck on the surface.

A flip flop to represent a bit takes about 6 transistors, but in a typical modern GPU is going to need way more transitors to wire that bit - at least 20-30. multiply that by the minimum amount of bits to represent a single NN weight and you're looking at about 200-300 transitors just to represent one NN param for computing

And that's for actual compute. The actual weights in a GPU are stored most of the time in DRAM which needs to be constantly shuttled back and forth between the GPU's SRAM and HBM DRAM.

300 transistors with memory shuttling overhead versus a bit of fat membrane, and it's obvious general purpose GPU compute has a huge energy and compute overhead.

In the future, all 300 could conceivably replaced with a single crossbar latch in the form of a memristor.

▲

ninetyninenine 14 hours ago | parent | prev [-]

> Based on our understanding of biology and evolution we know that a squirrel understands its world more similarly to the way we do than an LLM.

Bro. Evolution is random walk. That means most of the changes are random and arbitrary based on whatever allows the squirrel to survive.

We know squirrels and humans diverged from a common ancestor but we do not know how much has changed since the common ancestor and we do not know what changed and we do not know the baseline for what this common ancestor is.

Additionally we don’t even understand the current baseline. We have no idea how brains work. if we did we would be able to build a human brain but as of right now LLMs are the closest model we have ever created to something that simulates or is remotely similar to the brain.

So your fuzzy qualitative statement of we understand evolution and biology is baseless. We don’t understand shit.

> We also see that a squirrel, like us, is capable of continuous learning driven by its own goals, all on an energy budget many orders of magnitude lower. That last part is a strong empirical indication that suggests that LLMs are a dead end for AGI.

So an LLM cant continuously learn? You realize that LLMs are deployed agentically all the time now so they both continuously learn and follow goals? Right? You’re aware of this i hope.

The energy efficiency is a byproduct of hardware. The theory of LLMs and machine learning is independent from the flawed silicon technology that is causing the energy efficiencies. Like how a computer can be made mechanical an LLM can be as well. The LLM is independent of the actual implementation and energy inefficiencies. This is not at all a strong empirical indication that LLMs are a dead end. It’s a strong indication that your thinking is illogical and flawed.

> Also remember that Sutton is still of an AI maximalist. He isn't saying that AGI isn't possible, just that LLMs can't get us there.

He can’t say any of this because he doesn’t actually know. None of us know for sure. We literally don’t know why LLMs work. The fact that training transformers on massive amounts of data produced this level of intelligence was a total surprise for all the experts and we still have no idea why this stuff works. His statements are too overarching and glossing over a lot of things we don’t actually know.

Yann lecuun for example called LLMs stochastic parrots. We now know this is largely incorrect. The reason Yan can be so wrong is because nobody actually knows shit.

	▲	danans 14 hours ago \| parent [-]
		> Bro. Evolution is random walk. That means most of the changes are random and arbitrary based on whatever allows the squirrel to survive. For the vast majority of evolutionary history, very similar forces have shaped us and squirrels. The mutations are random, but the selections are not. If squirrels are a stretch for you, take the closest human relative: chimpanzees. There is a very reasonable hypothesis that their brains work very similarly to ours, far more similarly than ours to an LLM. > So an LLM cant continuously learn? You realize that LLMs are deployed agentically all the time now so they both continuously learn and follow goals? That is not continuous learning. The network does not retrain through that process. It's all in the agent's context. The agent has no intrinsic goals nor ability to develop them. It merely samples based on it's prior training and it's current content. It doesn't retrain through this process. Biological intelligence does retrain constantly. > The energy efficiency is a byproduct of hardware. The theory of LLMs and machine learning is independent from the flawed silicon technology that is causing the energy efficiencies. There is no evidence to support that a transformer model's inefficiency is hardware based. There is direct evidence to support that the inefficiency is influenced by the fact that LLM inference and training are both auto-regressive. Auto-regression maps to compute cycles maps to energy consumption. That's a problem with the algorithm, not the hardware. > The fact that training transformers on massive amounts of data produced this level of intelligence was a total surprise for all the experts The level of intelligence produced is only impressive compared to the prior state of the art, and at its impressive modeling the narrow band of intelligence represented by encoded language (not all language) produced by humans. In most every other aspect of intelligence - notably continuous learning driven by intrinsic goals - LLMs fail.