| ▲ | ACCount37 2 days ago | |
Those are LLMs with an extra modality bolted to them. Which is good - that it works this well speaks of the generality of autoregressive transformers, and the "reasoning over image data" progress with things like Qwen3-VL is very impressive. It's a good capability to have. But it's not a separate thing from the LLM breakthrough at all. Even the more specialized real time robotics AIs often have a bag of transformers backed by an actual LLM. | ||