| ▲ | fsiefken 9 hours ago | |
I am curious how these models would perform and how much energy they'd take to semi-realtime detect objects: SmolVLM2-500M - Moondream 0.5B/2B/2.5B - Qwen3-VL (3B) https://huggingface.co/collections/Qwen/qwen3-vl I am sure this is already worked on in Russia, Ukraine and The Netherlands. A lot can go wrong with autonomous flying. One could load the VLM on a high end android phone on the drone and have dual control. | ||
| ▲ | SpyCoder77 6 hours ago | parent [-] | |
A better way would be a VLA as opposed to a VLM. VLAs are meant to take action, where as vlms are for geneeral use. https://cognitivedrone.github.io/ | ||