Remix.run Logo
agentcoops 3 days ago

I've worked extensively with Tesseract, ABBYY, etc in a personal and professional context. Of course they work well for English-language documents without any complexity of layout that are scanned without the slightest defect. At this point, based on extensive testing for work, state of the art LLMs simply have better accuracy -- and an order of magnitude so if you have non-English documents with complex layouts and less than ideal scans. I'll give you speed, but the accuracy is so much greater (and the need for human intervention so much less) that in my experience it's a worthwhile trade-off.

I'm not saying this applies to you, but my sense from this thread is that many are comparing the results of tossing an image into a free ChatGPT session with an "OCR this document" prompt to a competent Tesseract-based tool... LLMs certainly don't solve any and every problem, but this should be based on real experiments. In fact, OCR is probably the main area where I've found them to simply be the best solution for a professional system.

privatelypublic 3 days ago | parent [-]

Yea. As usual, I inarticulately didn't make a good argument for my point. A tuned system with optimized workflow will by far have the best results. And- maybe llms will be a key resource in bringing the OCR into usable/profitable areas.

But, theres also a ton of "I don't want to deal with this" type work items that can't justify a full workflow process build out- but that LLMs get near enough to perfect to be "good enough." The bad part is, the LLMs don't explain to people the kinds of mistakes to expect from them.