| ▲ | charcircuit 2 days ago | |
Visual reasoning models. Having a computer being able to understand what is happening in the real world is very useful. | ||
| ▲ | ACCount37 2 days ago | parent [-] | |
Those are LLMs with an extra modality bolted to them. Which is good - that it works this well speaks of the generality of autoregressive transformers, and the "reasoning over image data" progress with things like Qwen3-VL is very impressive. It's a good capability to have. But it's not a separate thing from the LLM breakthrough at all. Even the more specialized real time robotics AIs often have a bag of transformers backed by an actual LLM. | ||