Remix.run Logo
toddmorey a day ago

I’m actually amazed at the output since GLM doesn’t have eyes. If GLM 5.2 costs 1/5 as much, seems like it could be set up to reach out to a multimodal model for vision tasks when required. Closer to parity but probably still significantly cheaper.

horsawlarway a day ago | parent | next [-]

I'm also very impressed at the output given the lack of image support.

They picked a task that heavily favors a model that can do multi-modal with images, and GLM still came within striking distance.

What I'm hearing from this article is that the next generation of open models that includes better multi-modal support are basically no-brainers for adoption.

Seems like a HUGE win for Z.ai and open models in general here.

killingtime74 13 hours ago | parent | prev [-]

Yes, it could just make one call to a multimodal llm to describe the scene