Not necessarily. They can generate letters, tokens, or words in any order. They can even write them all at once like they do in a diffusion model. Next token generation (auto-reggresion) is just a design choice of GPT, mostly for practical reasons. It fits naturally to the task at hand (we humans also generate words in sequential order). Also they have to train GPT in a self-supervised manner since we don't have labeled internet scale data. Auto-regression solves that problem as well.

The distinction I want to emphasize is that they don't just predict words statistically. They model the world, understand different concepts and their relationships, can think on them, can plan and act on the plan, can reason up to a point, in order to generate the next token. It learns all of these via that training scheme. It doesn't learn just the frequency of word relationships, unlike the old algorithms. Trillions are parameters do much more than that.

▲

griffzhowl 5 days ago | parent | next [-]

> The distinction I want to emphasize is that they don't just predict words statistically. They model the world, understand different concepts and their relationships, can think on them, can plan and act on the plan, can reason up to a point, in order to generate the next token.

This sounds way over-blown to me. What we know is that LLMs generate sequences of tokens, and they do this by clever ways of processing the textual output of millions of humans.

You say that, in addition to this, LLMs model the world, understand, plan, think, etc.

I think it can look like that, because LLMs are averaging the behaviours of humans who are actually modelling, understanding, thinking, etc.

Why do you think that this behaviour is more than simply averaging the outputs of millions of humans who understand, think, plan, etc.?

	▲	ozgung 5 days ago \| parent [-]
		> Why do you think that this behaviour is more than simply averaging the outputs of millions of humans who understand, think, plan, etc.? This is why it’s important to make the distinction that Machine Learning is a different field than Statistics. Machine Learning models does not “average” anything. They learn to generalize. Deep Learning models can handle edge cases and unseen inputs very well. In addition to that, OpenAI etc. probably use a specific post-training step (like RLHF or better) for planning, reasoning, following instructions step by step etc. This additional step doesn’t depend on the outputs of millions of humans.

▲

HarHarVeryFunny 5 days ago | parent | prev | next [-]

How can an LLM model the world, in any meaningful way, when it has no experience of the world?

An LLM is a language model, not a world model. It has never once had the opportunity to interact with the real world and see how it responds - to emit some sequence of words (the only type of action it is capable of generating), predict what will happen as a result, and see if it was correct.

During training the LLM will presumably have been exposed to some second person accounts (as well as fictional stories) of how the world works, mixed up with sections of stack overflow code and Reddit rantings, but even those occasional accounts of real world interactions (context, action + result) are only at best teaching it about the context that someone else, at that point in their life, saw relevant to mention as causal/relevant to the action outcome. The LLM isn't even privvy to the world model of the raconteur (let alone the actual complete real world context in which the action was taken, or the detailed manner in which it was performed), so this is a massively impoverished source of 2nd hand experience from which to learn.

It would be like someone who had spent their whole life locked in a windowless room reading randomly ordered paragraphs from other peoples diaries of daily experience (also randomly interpersed with chunks of fairy tales and python code), without themselves ever having actually seen a tree or jumped in a lake, or ever having had the chance to test which parts of the mental model they had built, of what was being described, were actually correct or not, and how it aligned with the real outside world they had never laid eyes on.

When someone builds an AGI capable of continual learning, and sets it loose in the world to interact with it, then it'll be reasonable to say it has it's own world model of how the world works, but as as far as pre-trained language models go, it seems closer to the mark to say they they are indeed just language models, modelling the world of words which is all they know, and the only kind of model for which they had access to feedback (next word prediction errors) to build.

▲

istjohn 5 days ago | parent [-]

We build mental models of things we have not personally experienced all the time. Such mental models lack the detail and vividness of that of someone with first-hand experience, but they are nonetheless useful. Indeed, a student of physics who has never touched a baseball may have a far more accurate and precise mental model of a curve ball than a major league pitcher.

▲

HarHarVeryFunny 4 days ago | parent [-]

Sure, but the nature of the model can only reflect the inputs (incl. corrections) that it was built around. A theoretical model of the aerodynamics of a curve ball isn't going to make the physics prof an expert pitcher, maybe not able to throw a curve ball at all.

Given the widely different natures of a theoretical "book smart" model vs a hands-on model informed by the dynamics of the real world and how it responds to your own actions, it doesn't seem useful to call these the same thing.

For sure the LLM has, in effect, some sort of distributed statistical model of it's training material, but this is not the same as knowledge represented by someone/something that has hands-on world knowledge. You wouldn't train a autonomous car to drive by giving it an instruction manual and stories of peoples near-miss experiences - you'd train it in a simulator (or better yet real world), where it can learn a real world model - a model of the world you want it to know about and be effective in, not a WORD model of how drivers are likely to describe their encounters with black ice and deer on the road.

	▲	istjohn 4 days ago \| parent [-]
		You're moving the goal posts. OP wrote: > The distinction I want to emphasize is that they don't just predict words statistically. They model the world, understand different concepts and their relationships, can think on them, can plan and act on the plan, can reason up to a point, in order to generate the next token. You replied: > How can an LLM model the world, in any meaningful way, when it has no experience of the world? > An LLM is a language model, not a world model. No one in this discussion has claimed that LLM's are effective general purpose agents, able to throw a curve ball, or drive a vehicle. The claim is that they do model the world in a meaningfull sense. You may be able to make a case for that being false, but the assumption that direct experience is required to form a model of a certain domain is not an assumption we make of humans. Some domains, such as mathematics, can only be accessed through abstract reasoning, but it's clear that mathematicians form models of mathematical objects and domains that cannot be directly experienced. I feel like you are arguing against a claim much stronger than what is being made. No one is arguing that LLM's understand the world in the same way human's do. But they do form models of the world.

▲

jurgenaut23 5 days ago | parent | prev [-]

Can you provide sources of your claim that LLMs “model the world”.

▲

ozgung 5 days ago | parent | next [-]

You are right that it is a bold claim but here is a relevant summary: https://en.wikipedia.org/wiki/Stochastic_parrot#Interpretabi...

I think "The Platonic Representation Hypothesis" is also related: https://phillipi.github.io/prh/

Unfortunately, large LLMs like ChatGPT and Claude are blackbox for researchers. They can't probe what is going on inside those things.

▲

lgas 5 days ago | parent | prev [-]

It seems somewhat obvious to me. Language models the world, and LLMs model language. If A models B and B models C then A models C, as well, no?

	▲	TurboTveit 5 days ago \| parent [-]
		Can you provide sources of your claim that language “model the world”.