| ▲ | toddmorey a day ago | |
I’m actually amazed at the output since GLM doesn’t have eyes. If GLM 5.2 costs 1/5 as much, seems like it could be set up to reach out to a multimodal model for vision tasks when required. Closer to parity but probably still significantly cheaper. | ||
| ▲ | horsawlarway a day ago | parent | next [-] | |
I'm also very impressed at the output given the lack of image support. They picked a task that heavily favors a model that can do multi-modal with images, and GLM still came within striking distance. What I'm hearing from this article is that the next generation of open models that includes better multi-modal support are basically no-brainers for adoption. Seems like a HUGE win for Z.ai and open models in general here. | ||
| ▲ | killingtime74 13 hours ago | parent | prev [-] | |
Yes, it could just make one call to a multimodal llm to describe the scene | ||