Remix.run Logo
brumar 2 hours ago

Tangentially related: I don't think OCR is the right term and I am generally vocal about that. But seeing this unquestioned here, I am wondering if I am the one who is wrong here. Is it ok to call this OCR? To me ocr means text in the end, not visual tokens.

parsimo2010 2 hours ago | parent | next [-]

OCR means optical character recognition. The terms do not require a direct transcription, but that is mostly what OCR meant in the past. If you’re using an LLM’s vision capability to pass in text and the LLM actually understands it, then I would say that it recognized the characters, hence OCR seems okay to use.

TurdF3rguson 2 hours ago | parent | prev | next [-]

It's not. OCR is not what the vision model is doing here. We're used to using OCR as a verb but it's more accurate to say the model "visioned" it.

Also, some models still do OCR and it's usually way more expensive that way.

devmor 2 hours ago | parent | prev [-]

So if I OCR a document, edit it, and print it, OCR didn't happen?