Remix.run Logo
Stagnant 3 hours ago

Chrome ships a local OCR model for text extraction from PDFs which is better than any of the VLM or open source OCR models i've tried. I had a few hundred gigs of old newspaper scans and after trying all the other options I ended up building a wrapper around the DLL it uses to get the text and bboxes. Performance and accuracy on another level compared to tesseract, and while VLM models sometimes produced good results they just seemed unreliable.

I've thought of open sourcing the wrapper but havent gotten around to it yet. I bet claude code can build a functioning prototype if you just point it to "screen_ai" dir under chrome's user data.

mwcampbell 2 hours ago | parent | next [-]

What's the name of this DLL? I assume it's separate from the monster chrome.dll, and that the model is proprietary.

Stagnant 8 minutes ago | parent [-]

chrome_screen_ai.dll is the name of the dll (libchromescreenai.so on linux) and yes it is proprietary. It isn't included by default, Chrome uses its component service to download it automatically when you open a PDF file that doesn't have pre-existing OCR'd text on it. You can download it separately from here: https://chrome-infra-packages.appspot.com/p/chromium/third_p...

zzleeper 3 hours ago | parent | prev [-]

Surprisingly, I have a few hundred gigs of old newspaper scans so am very curious.

How fast was it per page? Do you recall if it's CPU or GPU based? TY!