▲ | agentcoops 3 days ago | |
Have you evaluated this lately? Last year or even just earlier this year I would have mostly agreed with you. At this point, however, with at least the documents I have been working on, OCR reliability with GPT5 or Mistral OCR [1] has been much better than even domain-trained classical OCR. If the documents have even slightly complex layout (to say nothing of page numbers or page headings or an uncommon font), the accuracy of state of the art LLMs has been in my work an order of magnitude greater. The ability to have the LLM tentatively combine trailing sentences across pages, which is especially useful if you have to work with documents in German say, is invaluable. | ||
▲ | zarzavat 3 days ago | parent [-] | |
I asked GPT-5 to OCR a table for me the other day, it hallucinated perhaps 10% of the values. This was a screenshot of a spreadsheet, with large font, not challenging except for the layout. What's interesting is that I asked it to also read the background colors of the cells and it did much worse on that task. I believe these models could be useful for a first pass if you are willing to manually review everything they output, but the failure mode is unsettling. |