Remix.run Logo
procaryote 2 days ago

LLMs work nothing like Karl Friston's free energy principle though

FloorEgg 2 days ago | parent [-]

LLMs embody the free-energy principle computationally. They maintain an internal generative model of language and continually minimize “surprise”, the difference between predicted and actual tokens, during both training and infeence. In Friston’s terms, their parameters encode beliefs about the causes of linguistic input; forward passes generate predictions, and backpropagation adjusts internal states to reduce prediction error, just as perception updates beliefs to minimize free energy. During inference, autoregressive generation can be viewed as active inference: each new token selection aims to bring predicted sensory input (the next word) into alignment with the model’s expectations. In a broader sense, LLMs exemplify how a self-organizing system stabilizes itself in a high-dimensional environment by constantly reducing uncertainty about its inputs, a synthetic analogue of biological systems minimizing free energy to preserve their structural and informational coherence.

procaryote 2 days ago | parent [-]

You might have lost me but what you're describing doesn't sound like an LLM. E.g:

> each new token selection aims to bring predicted sensory input (the next word) into alignment with the model’s expectations.

what does that mean? An llm generates the next word based on what best matches its training, with some level of randomisation. Then it does it all again. It's not a percepual process trying to infer a reality from sensor data or anything

FloorEgg 2 days ago | parent [-]

> An llm generates the next word based on what best matches its training, with some level of randomisation.

This is sort of accurate, but not precise.

An LLM generates the next token by sampling from a probability distribution over possible tokens, where those probabilities are computed from patterns learned during training on large text datasets.

The difference in our explanations is that you are biasing towards LLMs being fancy database indexes, and I am emphasizing that LLMs build a model of what they are trained on and respond based on that model, which is more like how brains and cells work than you are recognizing. (though I admit my understanding of microbiology places me just barely past peak Mt Stupid [Dunning Kruger], I don't really understand how individual cells do this and can only hand-wavey explain it).

Both systems take input, pass it through a network of neurons, and produce output. Both systems are trying to minimize surprise in predictions. The differences are primarily in scale and complexity. Human brains have more types of neurons (units) and more types of connections (parameters). LLMs more closely mimic the prefrontal cortex, whereas e.g. the brainstem is a lot more different in terms of structure and cellular diversity.

You can make a subjective ontological choice to draw categorical boundaries between them, or you can plot them on a continuum of complexity and scale. Personally I think both framings are useful, and to exclude either is to exclude part of the truth.

My point is that if you draw a subjective categorical boundary around what you deem is consciousness and say that LLMs are outside of that, that is subjectively valid. You can also say that consciousness is a continuum, and individual cells, blades of grass, ants, mice, and people experience different types of consciousness on that continuum. If you take the continuum view, then what follows is a reasonable assumption that LLMs experience a very different kind of consciousness that takes in inputs at about the same rate as a small fish, models those inputs for a few seconds, and then produces outputs. What exactly that "feels" like is as foreign to me as it would be to you. I assume its even more foreign than what it would "feel" like to be a blade of grass.

procaryote 2 days ago | parent [-]

I'm not sure why you'd describe "sampling from a probability distribution over possible tokens" as "minimize surprise in predictions" other than to make it sound similar to the free energy thing.

The free energy thing as I understand it has internal state, makes predictions, evaluates against new input and adjusts it internal state to continuously learn to predict new input better. This might if you squint look similar to training a neural network, although the mechanisms are different, but it's very distinct from the inference step

FloorEgg a day ago | parent [-]

"Minimize surprise" and "maximize accurate predictions" are the same thing mathematically. Minimize free energy = minimize prediction error.

LLMs do everything modelled in the free energy principle, they just don't do continuous learning. (They don't do perceptual inference after RL)

Your tone ("free energy thing" and "if you squint") comes off as dismissive and not intellectually honest. Here I thought you were actually curious, but I guess not?

procaryote a day ago | parent [-]

Poor wording on my side, I'm sorry. Thank you for explaining your reasoning

FloorEgg 21 hours ago | parent [-]

Thank you for saying that :)