| ▲ | StableAlkyne 4 hours ago | |||||||||||||||||||||||||||||||
How do these compare to something like Tesseract? I remember that one clearing the scoreboard for many years, and usually it's the one I grab for OCR needs due to its reputation. | ||||||||||||||||||||||||||||||||
| ▲ | kergonath 4 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||
Tesseract does not understand layout. It’s fine for character recognition, but if I still have to pipe the output to a LLM to make sense of the layout and fix common transcription errors, I might as well use a single model. It’s also easier for a visual LLM to extract figures and tables in one pass. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | chaps 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Tesseract v4 when it was released was exceptionally good and blew everything out of the water. Have used it to OCR millions of pages. Tbh, I miss the simplicity of tesseract. The new models are similarly better compared to tesseract v4. But what I'll say is that don't expect new models to be a panacea for your OCR problems. The edge case problems that you might be trying to solve (like, identifying anchor points, or identifying shared field names across documents) are still pretty much all problematic still. So you should still expect things like random spaces or unexpected characters to jam up your jams. Also some newer models tend to hallucinate incredibly aggressively. If you've ever seen an LLM get stuck in an infinite, think of that. | ||||||||||||||||||||||||||||||||
| ▲ | 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||||||||