Remix.run Logo
qwertox 7 hours ago

Isn't it more like this: JEPA looks at the video, "a dog walks out of the door, the mailman comes, dog is happy" and the next frame would need to look like "mailman must move to mailbox, dog will run happily towards him", which then an image/video generator would need to render.

Genie looks at the video, "when this group of pixels looks like this and the user presses 'jump', I will render the group different in this way in the next frame."

Genie is an artist drawing a flipbook. To tell you what happens next, it must draw the page. If it doesn't draw it, the story doesn't exist.

JEPA is a novelist writing a summary. To tell you what happens next, it just writes "The car crashes." It doesn't need to describe what the twisted metal looks like to know the crash happened.

7 hours ago | parent [-]
[deleted]