▲ | bbor 2 days ago | |
I'm very far from an expert, but:
The visual model "understands" it most readily, I'd say -- like a traditional Waymo CNN "understands" the 3D space of the road. I don't think they've explicitly given the models a pre-generated pointcloud of the space, if that's what you're asking. But maybe I'm misunderstanding?
It appears that the robot is being fed plain english instructions, just like any VLM would -- instead of the very common `text+av => text` paradigm (classifiers, perception models, etc), or the less common `text+av => av` paradigm (segmenters, art generators, etc.), this is `text+av => movements`.Feeding the robots the appropriate instructions at the appropriate time is a higher-level task than is covered by this demo, but I think is pretty clearly doable with existing AI techniques (/a loop).
If your question is "where's the GPUs", their "AI" marketing page[1] pretty clearly implies that compute is offloaded, and that only images and instructions are meaningfully "on board" each robot. I could see this violating the understanding of "totally local" that you mentioned up top, but IMHO those claims are just clarifying that the individual figures aren't controlled as one robot -- even if they ultimately employ the same hardware. Each period (7Hz?) two sets of instructions are generated.
Again, I don't work in robotics at all, but have spent quite a while cataloguing all the available foundational models, and I wouldn't describe anything here as "totally novel" on the model level. Certainly impressive, but not, like, a theoretical breakthrough. Would love for an expert to correct me if I'm wrong, tho!EDIT: Oh and finally:
Surely they are downplaying the difficulties of getting this setup perfectly, and don't show us how many bad runs it took to get these flawless clips.They are seeking to raise their valuation from ~$3B to ~$40B this month, sooooooo take that as you will ;) https://www.reuters.com/technology/artificial-intelligence/r... | ||
▲ | plipt 2 days ago | parent [-] | |
I think that answers most of my questions.I am also not in robotics, so this demo does seem quite impressive to me but I think they could have been more clear on exactly what technologies they are demonstrating. Overall still very cool. Thanks for your reply |