|
| ▲ | mritchie712 4 days ago | parent | next [-] |
| > Why LLMs Suck at OCR I paste screenshots into claude code everyday and it's incredible. As in, I can't believe how good it is. I send a screenshot of console logs, a UI and some HTML elements and it just "gets it". So saying they "Suck" makes me not take your opinion seriously. |
| |
| ▲ | ritvikpandey21 4 days ago | parent | next [-] | | yeah models are definitely improving, but we've found even the latest ones still hallucinate and infer text rather than doing pure transcription. we carry out very rigorous benchmarks against all of the frontier models. we think the differentiation is in accuracy on truly messy docs (nested tables, degraded scans, handwriting) and being able to deploy on-prem/vpc for regulated industries. | |
| ▲ | mikert89 4 days ago | parent | prev [-] | | they need to convince customers its what they need |
|
|
| ▲ | serjester 4 days ago | parent | prev | next [-] |
| This is a hand wavy article that dismisses away VLMs without acknowledging the real world performance everyone is seeing. I think it’d be far more useful if you published an eval. |
|
| ▲ | mikert89 4 days ago | parent | prev [-] |
| one or two more model releases, and raw documents passed to claude will beat whatever prompt voodoo you guys are cooking |
| |
| ▲ | holler 4 days ago | parent [-] | | Having worked in the space I have real doubts about that. Right now Claude and other top models already do a decent job at e.g. "generate OCR from this document". But as mentioned there are serious failure modes, it's non-deterministic, and especially cost-prohibitive at scale. |
|