| ▲ | MoonGhost 8 months ago | |
3D reconstruction, from stereo or mono camera, have both. Object detection, text reading. Ideally it should recognize the speaker, simple gestures. Take audio, feed speech to LLM, get the output. Being able to detect and move out of the way of walking humans. Most of it has been done, like 3d structure and localization from motion. There are reference implementations. I've done it before too. It's sort of open ended project. Having LLM with vision on mobile robot with arm.. has a lot of applications. AGX Orin 64GB is capable of running serious models. | ||