Remix.run Logo
SR2Z a day ago

Right, but modeling the structure of language is a question of modeling word order and binding affinities. It's the Chinese Room thought experiment - can you get away with a form of "understanding" which is fundamentally incomplete but still produces reasonable outputs?

Language in itself attempts to model the world and the processes by which it changes. Knowing which parts-of-speech about sunrises appear together and where is not the same as understanding a sunrise - but you could make a very good case, for example, that understanding the same thing in poetry gets an LLM much closer.

hackinthebochs a day ago | parent | next [-]

LLMs aren't just modeling word co-occurrences. They are recovering the underlying structure that generates word sequences. In other words, they are modeling the world. This model is quite low fidelity, but it should be very clear that they go beyond language modeling. We all know of the pelican riding a bicycle test [1]. Here's another example of how various language models view the world [2]. At this point it's just bad faith to claim LLMs aren't modeling the world.

[1] https://simonwillison.net/2025/Aug/7/gpt-5/#and-some-svgs-of...

[2] https://www.lesswrong.com/posts/xwdRzJxyqFqgXTWbH/how-does-a...

SR2Z a day ago | parent | next [-]

The "pelican on a bicycle" test has been around for six months and has been discussed a ton on the internet; that second example is fascinating but Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E (Paris, notoriously on land). How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?

I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.

Yes, you could say this about human beings, but I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.

The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.

Terr_ 21 hours ago | parent | next [-]

> Wikipedia has infoboxes containing coordinates like 48°51′24″N 2°21′8″E

I imagine simply making a semitransparent green land-splat in any such Wikipedia coordinate reference would get you pretty close to a world map, given how so much of the ocean won't get any coordinates at all... Unless perhaps the training includes a compendium of deep-sea ridges and other features.

skissane a day ago | parent | prev | next [-]

> The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.

A lot of humans contradict themselves all the time… therefore they cannot have any kind of sophisticated world model?

SR2Z 10 hours ago | parent [-]

A human generally does not contradict themselves in a single conversation, and if they do they generally can provide a satisfying explanation for how to resolve the contradiction.

hackinthebochs a day ago | parent | prev [-]

>How much would you bet that there isn't a CSV somewhere in the training set exactly containing this data for use in some GIS system?

Maybe, but then I would expect more equal performance across model sizes. Besides, ingesting the data and being able to reproduce it accurately in a different modality is still an example of modeling. It's one thing to ingest a set of coordinates in a CSV indicating geographic boundaries and accurately reproduce that CSV. It's another thing to accurately indicate arbitrary points as being within the boundary or without in an entirely different context. This suggests a latent representation independent of the input tokens.

>I think that "modeling the world" is a red herring, and that fundamentally an LLM can only model its input modalities.

There are good reasons to think this isn't the case. To effectively reproduce text that is about some structure, you need a model of that structure. A strong learning algorithm should in principle learn the underlying structure represented with the input modality independent of the structure of the modality itself. There are examples of this in humans and animals, e.g. [1][2][3]

>I think a more useful definition of "model the world" is that a model needs to realize any facts that would be obvious to a person.

Seems reasonable enough, but it is at risk of being too human-centric. So much of our cognitive machinery is suited for helping us navigate and actively engage the world. But intelligence need not be dependent on the ability to engage the world. Features of the world that are obvious to us need not be obvious to an AGI that never had surviving predators or locating food in its evolutionary past. This is why I find the ARC-AGI tasks off target. They're interesting, and it will say something important about these systems when they can solve them easily. But these tasks do not represent intelligence in the sense that we care about.

>The fact that frontier models can easily be made to contradict themselves is proof enough to me that they cannot have any kind of sophisticated world model.

This proves that an LLM does not operate with a single world model. But this shouldn't be surprising. LLMs are unusual beasts in the sense that the capabilities you get largely depend on how you prompt it. There is no single entity or persona operating within the LLM. It's more of a persona-builder. What model that persona engages with is largely down to how it segmented the training data for the purposes of maximizing its ability to accurately model the various personas represented in human text. The lack of consistency is inherent to its design.

[1] https://news.wisc.edu/a-taste-of-vision-device-translates-fr...

[2] https://www.psychologicalscience.org/observer/using-sound-to...

[3] https://www.nature.com/articles/s41467-025-59342-9

homarp a day ago | parent | prev [-]

and we can say that a bastardized version of the Sapir-Worf hypothesis applies: what's in the training set shapes or limits LLM's view of the world

moron4hire a day ago | parent [-]

Neither Sapir nor Whorf presented Linguistic Relativism as their own hypothesis and they never published together. The concept, if it exists at all, is a very weak effect, considering it doesn't reliably replicate.

homarp a day ago | parent [-]

i agree that's the pop name.

Don't you think it replicates well for LLM though?

ajross a day ago | parent | prev [-]

> Knowing which parts-of-speech about sunrises appear together and where is not the same as understanding a sunrise

What does "understanding a sunrise" mean though? Arguments like this end up resting on semantics or tautology, 100% of the time. Arguments of the form "what AI is really doing" likewise fail because we don't know what real brains are "really" doing either.

I mean, if we knew how to model human language/reasoning/whatever we'd just do that. We don't, and we can't. The AI boosters are betting that whatever it is (that we don't understand!) is an emergent property of enough compute power and that all we need to do is keep cranking the data center construction engine. The AI pessimists, you among them, are mostly just arguing from ludditism: "this can't possibly work because I don't understand how it can".

Who the hell knows, basically. We're at an interesting moment where technology and the theory behind it are hitting the wall at the same time. That's really rare[1], generally you know how something works and applying it just a question of figuring out how to build a machine.

[1] Another example might be some of the chemistry fumbling going on at the start of the industrial revolution. We knew how to smelt and cast metals at crazy scales well before we knew what was actually happening. Stuff like that.

subjectivationx 16 hours ago | parent | next [-]

Everyone reading this understands the meaning of a sunrise. It is a wonderful example of the use theory of meaning.

If you raised a baby inside a windowless solitary confinement cell for 20 years and then one day show them the sunrise on a video monitor, they still don't understand the meaning of a sunrise.

Trying to extract the meaning of a sunrise by a machine from the syntax of a sunrise data corpus is just totally absurd.

You could extract some statistical regularity from the pixel data of the sunrise video monitor or sunrise data corpus. That model may provide some useful results that can then be used in the lived world.

Pretending the model understands a sunrise though is just nonsense.

Showing the sunrise statistical model has some use in the lived world as proof the model understands a sunrise I would say borders on intellectual fraud considering a human doing the same thing wouldn't understand a sunrise either.

ajross 15 hours ago | parent [-]

> Everyone reading this understands the meaning of a sunrise

For a definition of "understands" that resists rigor and repeatability, sure. This is what I meant by reducing it to a semantic argument. You're just saying that AI is impossible. That doesn't constitute evidence for your position. Your opponents in the argument who feel AGI is imminent are likewise just handwaving.

To wit: none of you people have any idea what you're talking about. No one does. So take off the high hat and stop pretending you do.

meroes 12 hours ago | parent [-]

This all just boils down to the Chinese Room thought experiment, where Im pretty sure the consensus is nothing in the experiment (not the person inside, the whole emergent room, etc) understands Chinese like us.

Another example by Searle is a computer simulating digestion is not digesting like a stomach.

The people saying AI can’t form from LLMs are in the consensus side of the Chinese Room. The digestion simulator could tell us where every single atom is of a stomach digesting a meal, and it’s still not digestion. Only once the computer simulation breaks down food particles chemically and physically is it digestion. Only once an LLM received photons or has a physical capacity to receive photons is there anything like “seeing a night sky”.

a day ago | parent | prev | next [-]
[deleted]
pastel8739 a day ago | parent | prev [-]

Is it really so rare? I feel like I know of tons of fields where we have methods that work empirically but don’t understand all the theory. I’d actually argue that we don’t know what’s “actually” happening _ever_, but only have built enough understanding to do useful things.

ajross a day ago | parent [-]

I mean, most big changes in the tech base don't have that characteristic. Semiconductors require only 1920's physics to describe (and a ton of experimentation to figure out how to manufacture). The motor revolution of the early 1900's was all built on well-settled thermodynamics (chemistry lagged a bit, but you don't need a lot of chemical theory to burn stuff). Maxwell's electrodynamics explained all of industrial electrification but predated it by 50 years, etc...

skydhash a day ago | parent [-]

Those big changes always happens because someone presented a simpler model that explains stuff enough we can build stuff on it. It's not like semiconductors raw materials wasn't around.

The technologies around LLMs is fairly simple. What is not is the actual size of data being ingested and the number of resulting factors (weight). We have a formula and the parameters to generate grammatically perfect text, but to obtain it, you need TBs of data to get GBs of numbers.

In contrast something like TM or Church's notation is pure genius. Less than a 100 pages of theorems that are one of the main pillars of the tech world.

ajross 15 hours ago | parent [-]

> Those big changes always happens because someone presented a simpler model that explains stuff enough we can build stuff on it.

Again, no it doesn't. It didn't with industrial steelmaking, which was ad hoc and lucky. It isn't with AI, which no one actually understands.

skydhash 10 hours ago | parent [-]

I’m pretty sure there were always formula for getting high quality steel even before the industrial age. And you only need a few textbooks and papers to understand AI.