| ▲ | ritvikpandey21 4 days ago | |||||||||||||
we disagree! we've found llms by themselves aren't enough and suffer from pretty big failure modes like hallucination and inferring text rather than pure transcription. we wrote a blog about this [1]. the right approach so far seems to be a hybrid workflow that uses very specific parts of the language model architecture. | ||||||||||||||
| ▲ | mritchie712 4 days ago | parent | next [-] | |||||||||||||
> Why LLMs Suck at OCR I paste screenshots into claude code everyday and it's incredible. As in, I can't believe how good it is. I send a screenshot of console logs, a UI and some HTML elements and it just "gets it". So saying they "Suck" makes me not take your opinion seriously. | ||||||||||||||
| ||||||||||||||
| ▲ | serjester 4 days ago | parent | prev | next [-] | |||||||||||||
This is a hand wavy article that dismisses away VLMs without acknowledging the real world performance everyone is seeing. I think it’d be far more useful if you published an eval. | ||||||||||||||
| ▲ | mikert89 4 days ago | parent | prev [-] | |||||||||||||
one or two more model releases, and raw documents passed to claude will beat whatever prompt voodoo you guys are cooking | ||||||||||||||
| ||||||||||||||