Remix.run Logo
ollin 8 hours ago

Really great to see this released! Some interesting videos from early-access users:

- https://youtu.be/15KtGNgpVnE?si=rgQ0PSRniRGcvN31&t=197 walking through various cities

- https://x.com/fofrAI/status/2016936855607136506 helicopter / flight sim

- https://x.com/venturetwins/status/2016919922727850333 space station, https://x.com/venturetwins/status/2016920340602278368 Dunkin' Donuts

- https://youtu.be/lALGud1Ynhc?si=10ERYyMFHiwL8rQ7&t=207 simulating a laptop computer, moving the mouse

- https://x.com/emollick/status/2016919989865840906 otter airline pilot with a duck on its head walking through a Rothko inspired airport

msabalau 3 hours ago | parent | next [-]

I was lucky enough to be an early tester, here's a brief video walking through the process of creating worlds, showing examples--walking on the moon, with Nasa photo as part of the prompt, being in 221B Baker street with Holmes and Watson, wandering through a night market in Taipei as a giant boba milk tea (note how the stalls are different, and sell different foods), and also exploring the setting of my award-nominated tabletop RPG.

https://www.youtube.com/watch?v=FyTHcmWPuJE

It's an experimental research prototype, but it also feels like a hint of the future. Feel free to ask any questions.

llmthrow0827 2 hours ago | parent | prev | next [-]

These are extremely impressive from a technological progression standpoint, and at the same time not at all compelling, in the same way AI images and LLM prose are and are not.

It's neat I guess that I can use a few words and generate the equivalent of an Unreal 5 asset flip and play around in it. Also I will never do that, much less pay some ongoing compute cost for each second I'm doing it.

RaftPeople 6 hours ago | parent | prev | next [-]

I liked that first one and I hope someone creates one of going back to dinosaur age, i want to see that.

post-it 4 hours ago | parent | next [-]

One step closer to the science-based dinosaur MMO we were promised.

echelon 6 hours ago | parent | prev [-]

Tim is awesome.

Ironically, he covered PixVerse's world model last week and it came close to your ask: https://youtu.be/SAjKSRRJstQ?si=dqybCnaPvMmhpOnV&t=371

(Earlier in the video it shows him live prompting.)

World models are popping up everywhere, from almost every frontier lab.

Valk3_ 5 hours ago | parent | prev [-]

Any thoughts about Project Genie?

ollin 3 hours ago | parent [-]

On a technical level, this looks like the same diffusion transformer world model design that was shown in the Genie 3 post (text/memory/d-pad input, video output, 60sec max context, 720p, sub-10FPS control latency due to 4-frame temporal compression). I expect the public release uses a cheaper step-distilled / quantized version. The limitations seen in Genie 3 (high control latency, gradual loss of detail and drift towards videogamey behavior, 60s max rollout length) are still present. The editing/sharing tools, latency, cost, etc. can probably improve over time with this same model checkpoint, but new features like audio input/output, higher resolution, precise controls, etc. likely won't happen until the next major version.

From a product perspective, I still don't have a good sense of what the market for WMs will look like. There's a tension between serious commercial applications (robotics, VFX, gamedev, etc. where you want way, way higher fidelity and very precise controllability), vs current short-form-demos-for-consumer-entertainment application (where you want the inference to be cheap-enough-to-be-ad-supported and simple/intuitive to use). Framing Genie as a "prototype" inside their most expensive AI plan makes a lot of sense while GDM figures out how to target the product commercially.

On a personal level, since I'm also working on world models (albeit very small local ones https://news.ycombinator.com/item?id=43798757), my main thought is "oh boy, lots of work to do". If everyone starts expecting Genie 3 quality, local WMs need to become a lot better :)