Roughly, actual intelligence needs to maintain a world model in its internal representation, not merely an embedding of language, which is a very different data structure and probably will be learned in a very different way. This includes things like:

- a map of the world, or concept space, or a codebase, etc

- causality

- "factoring" which breaks down systems or interactions into predictable parts

Language alone is too blurry to do any of these precisely.

▲

coldtea 4 days ago | parent | next [-]

>Roughly, actual intelligence needs to maintain a world model in its internal representation

And how's that not like stored information (memories) and weighted links between each and/or between groups of them?

	▲	sixo 4 days ago \| parent [-]
		It probably is a lot like that! I imagine it's a matter of specializing the networks and learning algorithms to converge to world-model-like-structures rather than language-like-ones. All these models do is approximate the underlying manifold structure, just, the manifold structure of a causal world is different from that of language.

▲

astrange 4 days ago | parent | prev | next [-]

> Roughly, actual intelligence needs to maintain a world model in its internal representation

This is GOFAI metaphor-based development, which never once produced anything useful. They just sat around saying things like "people have world models" and then decided if they programmed something and called it a "world model" they'd get intelligence, it didn't work out, but then they still just went around claiming people have "world models" as if they hadn't just made it up.

An alternative thesis "people do things that worked the last time they did them" explains both language and action planning better; eg you don't form a model of the contents of your garbage in order to take it to the dumpster.

https://www.cambridge.org/core/books/abs/computation-and-hum...

	▲	sixo 4 days ago \| parent [-]
		I see no reason to believe an effective LLM-scale "world-modeling" model would look anything like the kinds of things previous generations of AI researchers were doing. It will probably look a lot more like a transformer architecture--big and compute intensive and with a fairly simple structure--but with a learning process which is different in some key way that make different manifold structures fall out.

▲

4 days ago | parent | prev | next [-]

[deleted]

▲

SweetSoftPillow 4 days ago | parent | prev [-]

Please check an example #2 here: https://github.com/PicoTrex/Awesome-Nano-Banana-images/blob/...

It is not "language alone" anymore. LLMs are multimodal nowadays, and it's still just the beginning.

And keep in mind that these results are produced by a cheap, small and fast model.

	▲	mdaniel 4 days ago \| parent \| next [-]
		I thought you were making an entirely different point with your link since the lag caused the page to view just the upskirt render until the rest of the images loaded in and it could scroll to the reference of your actual link Anyway, I don't think that's the flex you think it is since the topology map clearly shows the beginning of the arrow sitting in the river and the rendered image decided to hallucinate a winding brook, as well as its little tributary to the west, in view of the arrow. I am not able to decipher the legend [that ranges from 100m to 500m and back to 100m, so maybe the input was hallucinated, too, for all I know] but I don't obviously see 3 distinct peaks nor a basin between the snow-cap and the smaller mound I'm willing to be more liberal for the other two images, since "instructions unclear" about where the camera was positioned, but for the topology one, it had a circle I know I'm talking to myself, though, given the tone of every one of these threads
	▲	devnullbrain 4 days ago \| parent \| prev [-]
		Every one of those is the wrong angle