| ▲ | brumar 2 hours ago | |
Tangentially related: I don't think OCR is the right term and I am generally vocal about that. But seeing this unquestioned here, I am wondering if I am the one who is wrong here. Is it ok to call this OCR? To me ocr means text in the end, not visual tokens. | ||
| ▲ | parsimo2010 2 hours ago | parent | next [-] | |
OCR means optical character recognition. The terms do not require a direct transcription, but that is mostly what OCR meant in the past. If you’re using an LLM’s vision capability to pass in text and the LLM actually understands it, then I would say that it recognized the characters, hence OCR seems okay to use. | ||
| ▲ | TurdF3rguson 2 hours ago | parent | prev | next [-] | |
It's not. OCR is not what the vision model is doing here. We're used to using OCR as a verb but it's more accurate to say the model "visioned" it. Also, some models still do OCR and it's usually way more expensive that way. | ||
| ▲ | devmor 2 hours ago | parent | prev [-] | |
So if I OCR a document, edit it, and print it, OCR didn't happen? | ||