Remix.run Logo
solid_fuel an hour ago

No?

The image is still getting run through OCR and turned back into text before being fed into the LLM. There is no efficiency gain here, rather we have learned that Anthropic is applying a discount to text fed in via OCR.

qingcharles an hour ago | parent [-]

I don't think this is what is happening, IMO. The models can genuinely "read" the text off the images, but usually at a less-than-perfect ratio, and it uses less tokens for the model on visual input than it does actually using OCR to convert them into text and then sending that in. I do not think there is any intermediate stage where they are applying a free OCR in this situation. (I realize that happens in some situations)