Remix.run Logo
js4ever a day ago

"GLM-5.2 hit a problem here, because it can't read images. It isn't multimodal. So instead of looking at a screenshot, it fell back on a hacky workaround: it wrote scripts to read the raw pixel data and check whether the colors came out roughly as expected."

A better way would be to use https://github.com/openbmb/MiniCPM-V

16 hours ago | parent | next [-]
[deleted]
twobitshifter a day ago | parent | prev [-]

Right, just give the text llm access to a vision specific agent and that problem can be solved. Or if you really want let it even call Opus with an image - seems like you’d still save money