▲ | agentcoops 3 days ago | |
100%. My sense is that many in this thread have never gone through the misery of trying to use classical OCR for non-English documents or where you can't control scan quality. I did a test recently with 18th-century German documents, written in a well-known and standardized but archaic script. The accuracy of classical models specifically trained on this corpus was an order of magnitude lower than GPT5. I haven't experimented personally or professionally with smaller models, but your experience makes me hopeful that we might even get this accurate OCR on phones sooner rather than later... | ||
▲ | bugglebeetle 3 days ago | parent [-] | |
William Mattingly has been doing a lot of work on similar documents in an archival context with VLLMs. You should check in on their work: |