Remix.run Logo
nl 2 hours ago

The reason they are called "world models" is because the internal representation of what they display represents a "world" instead of a video frame or image. The model needs to "understand" geometry and physics to output a video.

Just because there are errors in this doesn't mean it isn't significant. If a machine learning model understands how physical objects interact with each other that is very useful.

godelski an hour ago | parent | next [-]

  > what they display represents a "world" instead of a video frame or image.
Do they?

I'm unconvinced. The tiger and girl video is the clearest example. Nothing about that seems world representing

slashdave an hour ago | parent | prev | next [-]

> The model needs to "understand" geometry and physics to output a video.

No it doesn't. It merely needs to mimic.

PunchyHamster an hour ago | parent | prev [-]

I think the reason is "those words look nice on promo material". It is absolutely build to trigger hype from the clueless