I’ve been trying to convert a dense 60 page paper document to Markdown today from photos taken on my iPhone. I know this is probably not the best way to do it but it’s still been surprising to find that even the latest cloud models are struggling to process many of the pages. Lots of hallucination and “I can’t see the text” (when the photo is perfectly clear). Lots of retrying different models, switching between LLMs and old fashioned OCR, reading and correcting mistakes myself. It’s still faster than doing the whole transcription manually but I thought the tech was further along.

▲

bugglebeetle 4 days ago | parent [-]

Try this:

https://github.com/rednote-hilab/dots.ocr

	▲	mdaniel 4 days ago \| parent [-]
		The code is MIT, and the weights are labeled MIT although the actual license file in the weights repo seems to be mostly Apache 2 https://huggingface.co/rednote-hilab/dots.ocr/blob/main/NOTI... Seems to weigh about 6GB which feels reasonable to manage locally