| ▲ | montebicyclelo 7 hours ago | |
Reminds me of this [1] HN post from 9 months ago, where the author trained a neural network to do world emulation from video recordings of their local park — you can walk around in their interactive demo [2]. I don't have access to the DeepMind demo, but from the video it looks like it takes the idea up a notch. (I don't know the exact lineage of these ideas, but a general observation is that it's a shame that it's the norm for blog posts / indie demos to not get cited.) [1] https://news.ycombinator.com/item?id=43798757 [2] https://madebyoll.in/posts/world_emulation_via_dnn/demo/ | ||
| ▲ | ollin 4 hours ago | parent | next [-] | |
Yup, similar concepts! Just at two opposite extremes of the compute/scaling spectrum. - That forest trail world is ~5 million parameters, trained on 15 minutes of video, scoped to run on a five-year-old iPhone through a twenty-year old API (WebGL GPGPU, i.e OpenGL fragment shaders). It's the smallest '3D' world model I'm aware of. - Genie 3 is (most likely) ~100 billion parameters trained on millions of hours of video and running across multiple TPUs. I would be shocked if it's not the largest-scale world model available to the public. There are lots of neat intermediate-scale world models being developed as well (e.g. LingBot-World https://github.com/robbyant/lingbot-world, Waypoint 1 https://huggingface.co/blog/waypoint-1) so I expect we'll be able to play something of Genie quality locally on gaming GPUs within a year or two. | ||
| ▲ | 7 hours ago | parent | prev [-] | |
| [deleted] | ||