Remix.run Logo
montebicyclelo 7 hours ago

Reminds me of this [1] HN post from 9 months ago, where the author trained a neural network to do world emulation from video recordings of their local park — you can walk around in their interactive demo [2].

I don't have access to the DeepMind demo, but from the video it looks like it takes the idea up a notch.

(I don't know the exact lineage of these ideas, but a general observation is that it's a shame that it's the norm for blog posts / indie demos to not get cited.)

[1] https://news.ycombinator.com/item?id=43798757

[2] https://madebyoll.in/posts/world_emulation_via_dnn/demo/

ollin 4 hours ago | parent | next [-]

Yup, similar concepts! Just at two opposite extremes of the compute/scaling spectrum.

- That forest trail world is ~5 million parameters, trained on 15 minutes of video, scoped to run on a five-year-old iPhone through a twenty-year old API (WebGL GPGPU, i.e OpenGL fragment shaders). It's the smallest '3D' world model I'm aware of.

- Genie 3 is (most likely) ~100 billion parameters trained on millions of hours of video and running across multiple TPUs. I would be shocked if it's not the largest-scale world model available to the public.

There are lots of neat intermediate-scale world models being developed as well (e.g. LingBot-World https://github.com/robbyant/lingbot-world, Waypoint 1 https://huggingface.co/blog/waypoint-1) so I expect we'll be able to play something of Genie quality locally on gaming GPUs within a year or two.

7 hours ago | parent | prev [-]
[deleted]