The dawn of a world simulator

LarsDu88 10 minutes ago | parent | next [-]

I feel like there's a bit if a disconnect with the cool video demos demonstrated here and say, the type of world models someone like Yann Lecunn is talking about.

A proper world model like Jepa should be predicting in latent space where the representation of what is going on is highly abstract.

Video generation models by definition are either predicting in noise or pixel space (latent noise if the diffuser is diffusing in a variational encoders latent space)

It seems like what this lab is doing is quite vanilla, and I'm wondering if they are doing any sort of research in less demo sexy joint embedding predictive spaces.

There was a recent paper, LeJepa from LeCunn and a postdoc that actually fixes many of the mode distribution collapse issues with the Jepa embedding models I just mentioned

▲

nl 21 minutes ago | parent | prev | next [-]

The reason they are called "world models" is because the internal representation of what they display represents a "world" instead of a video frame or image. The model needs to "understand" geometry and physics to output a video.

Just because there are errors in this doesn't mean it isn't significant. If a machine learning model understands how physical objects interact with each other that is very useful.

▲

godelski 2 hours ago | parent | prev | next [-]

As a machine learning researcher, I don't get why these are called world models.

Visually, they are stunning. But it's nowhere near physical. I mean look at that video with the girl and lion. The tail teleports between legs and then becomes attached to the girl instead of the tiger.

Just because the visuals are high quality doesn't mean it's a world model or has learned physics. I feel like we're conflating these things. I'm much happier to call something a world model if its visual quality is dogshit but it is consistent with its world. And I say its world because it doesn't need to be consistent with ours

▲

nurettin an hour ago | parent | next [-]

> Visually, they are stunning.

The input images are stunning, model's result is another disappointing trip to uncanny valley. But we feel Ok as long as the sequence doesn't horribly contradict the original image or sound. That is the world model.

▲

IAmGraydon an hour ago | parent | prev [-]

>As a machine learning researcher, I don't get why these are called world models.

It's called "world models" because it's a grift. An out-in-the-open, shameless grift. Investors, pile on.

	▲	godelski 4 minutes ago \| parent [-]
		I'm just trying to be a bit more political as it can be hard to communicate the issues. My first degree is actually in physics and I'll just say... over there "world model" implies something very different.

▲

superb_dev 8 hours ago | parent | prev | next [-]

None of these examples videos seem like the kind of “experiments” that they’re talking about simulating with these models.

I was expecting them to test a simple hypothesis and compare the model results to a real world test

	▲	ainiriand 7 hours ago \| parent [-]
		It is not a world simulator, looks like a world fantasy.

▲

rmnclmnt 7 hours ago | parent | prev | next [-]

For a minute I was like (spoiler alert) « wow the creepy sci-fi theories from the DEVS tv show is taking place »… then I looked up the video and that’s just video generation at this point

	▲	qingcharles an hour ago \| parent [-]
		That's where this is headed, though. That's the end game.

▲

pedalpete 8 hours ago | parent | prev | next [-]

This looks interesting, but can someone explain to me how this is different from video generators using the previous frames as inputs to expand on the next frame?

Is this more than recursive video? If so, how?

▲

smusamashah 7 hours ago | parent [-]

See the demo on their homepage. Calling it a world simulator is a marketing gimmick. It's a worse video generator but you can interact with it in real time and direct the video a little bit. Next version of this thing will be worth looking, this one isnt.

	▲	netsharc 4 hours ago \| parent \| next [-]
		It's plato's cave, but in color! https://en.wikipedia.org/wiki/Allegory_of_the_cave / https://www.youtube.com/watch?v=1RWOpQXTltA&t=56s
	▲	vrighter 31 minutes ago \| parent \| prev \| next [-]
		why would you assume anything about "the next version"?
	▲	Animats 3 hours ago \| parent \| prev \| next [-]
		> Calling it a world simulator is a marketing gimmick. Yes, it should be called an AI Metaverse. It does do a nice job of short term prediction. That's useful as a component of common sense.
	▲	nowittyusername 5 hours ago \| parent \| prev [-]
		There is soo much marketing bs around these things it drives me nuts. and it doesn't help that the large labs and credible individuals like denis use these terms. "world models" are video generator with contextual memory but that term is soo misplaced. when one thinks of a "world model" you expect the thing to be at least be physics engine driven from its foundation, not the other way around where everything is generated and assumed at best.

▲

anigbrowl 5 hours ago | parent | prev | next [-]

This appears to be a simulator that produces only nice things.

	▲	01HNNWZ0MV43FF 5 hours ago \| parent [-]
		Only SFW, too

▲

nylonstrung 5 hours ago | parent | prev | next [-]

I can't wait for companies like this to run out of money

▲

arminiusreturns 4 hours ago | parent | prev [-]

I'm doing a metasim in full 3D with physics, I just keep seeing the limitations of the video format too much, but it is amazing when done right. The other biggest concern is licensing of output.