LLMs aren't just modeling word co-occurrences. They are recovering the underlying structure that generates word sequences. In other words, they are modeling the world. This model is quite low fidelity, but it should be very clear that they go beyond language modeling. We all know of the pelican riding a bicycle test [1]. Here's another example of how various language models view the world [2]. At this point it's just bad faith to claim LLMs aren't modeling the world.

[1] https://simonwillison.net/2025/Aug/7/gpt-5/#and-some-svgs-of...

[2] https://www.lesswrong.com/posts/xwdRzJxyqFqgXTWbH/how-does-a...

▲

SR2Z a day ago | parent | next [-]

The "pelican on a bicycle" test has been around for six months and has been discussed a ton on the internet; that second example is fascinating but Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E (Paris, notoriously on land). How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?

I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.

Yes, you could say this about human beings, but I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.

The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.

▲

Terr_ 21 hours ago | parent | next [-]

> Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E

I imagine simply making a semitransparent green land-splat in any such Wikipedia coordinate reference would get you pretty close to a world map, given how so much of the ocean won't get any coordinates at all... Unless perhaps the training includes a compendium of deep-sea ridges and other features.

▲

skissane a day ago | parent | prev | next [-]

> The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.

A lot of humans contradict themselves all the time… therefore they cannot have any kind of sophisticated world model?

	▲	SR2Z 10 hours ago \| parent [-]
		A human generally does not contradict themselves in a single conversation, and if they do they generally can provide a satisfying explanation for how to resolve the contradiction.

▲

hackinthebochs a day ago | parent | prev [-]

>How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?

Maybe, but then I would expect more equal performance across model sizes. Besides, ingesting the data and being able to reproduce it accurately in a different modality is still an example of modeling. It's one thing to ingest a set of coordinates in a CSV indicating geographic boundaries and accurately reproduce that CSV. It's another thing to accurately indicate arbitrary points as being within the boundary or without in an entirely different context. This suggests a latent representation independent of the input tokens.

>I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.

There are good reasons to think this isn't the case. To effectively reproduce text that is about some structure, you need a model of that structure. A strong learning algorithm should in principle learn the underlying structure represented with the input modality independent of the structure of the modality itself. There are examples of this in humans and animals, e.g. [1][2][3]

>I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.

Seems reasonable enough, but it is at risk of being too human-centric. So much of our cognitive machinery is suited for helping us navigate and actively engage the world. But intelligence need not be dependent on the ability to engage the world. Features of the world that are obvious to us need not be obvious to an AGI that never had surviving predators or locating food in its evolutionary past. This is why I find the ARC-AGI tasks off target. They're interesting, and it will say something important about these systems when they can solve them easily. But these tasks do not represent intelligence in the sense that we care about.

>The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.

This proves that an LLM does not operate with a single world model. But this shouldn't be surprising. LLMs are unusual beasts in the sense that the capabilities you get largely depend on how you prompt it. There is no single entity or persona operating within the LLM. It's more of a persona-builder. What model that persona engages with is largely down to how it segmented the training data for the purposes of maximizing its ability to accurately model the various personas represented in human text. The lack of consistency is inherent to its design.

[1] https://news.wisc.edu/a-taste-of-vision-device-translates-fr...

[2] https://www.psychologicalscience.org/observer/using-sound-to...

[3] https://www.nature.com/articles/s41467-025-59342-9

▲

homarp a day ago | parent | prev [-]

and we can say that a bastardized version of the Sapir-Worf hypothesis applies: what's in the training set shapes or limits LLM's view of the world

▲

moron4hire a day ago | parent [-]

Neither Sapir nor Whorf presented Linguistic Relativism as their own hypothesis and they never published together. The concept, if it exists at all, is a very weak effect, considering it doesn't reliably replicate.

	▲	homarp a day ago \| parent [-]
		i agree that's the pop name. Don't you think it replicates well for LLM though?