| ▲ | jdeng 5 hours ago | |
Glad to to see open source models are catching up and treat vision as first-class citizen (a.k.a native multimodal agentic model). GLM and Qwen models takes different approach, by having a base model and a vision variant (glm-4.6 vs glm-4.6v). I guess after Kimi K2.5, other vendors are going to the same route? Can't wait to see how this model performs on computer automation use cases like VITA AI Coworker. | ||