Remix.run Logo
mbeavitt 10 hours ago

Honestly I've been doing a lot of image-related work recently and the biggest thing here for me is the 3x higher resolution images which can be submitted. This is huge for anyone working with graphs, scientific photographs, etc. The accuracy on a simple automated photograph processing pipeline I recently implemented with Opus 4.6 was about 40% which I was surprised at (simple OCR and recognition of basic features). It'll be interesting to see if 4.7 does much better.

I wonder if general purpose multimodal LLMs are beginning to eat the lunch of specific computer vision models - they are certainly easier to use.

adrian_b 6 hours ago | parent | next [-]

I assume that by "higher resolution images" you mean images with a bigger size in pixels.

I expect that for the model it does not matter which is the actual resolution in pixels per inch or pixels per meter of the images, but the model has limits for the maximum width and the maximum height of images, as expressed in pixels.

orrito 9 hours ago | parent | prev [-]

Did you try the same with gemini 3 models? Those usually score higher on vision benchmarks