Remix.run Logo
deepsquirrelnet 3 days ago

Give the nanonets-ocr-s model a try. It’s a fine tune of Qwen 2.5 vl which I’ve had good success with for markdown and latex with image captioning. It uses a simple tagging scheme for page numbers, captions and tables.

davidwritesbugs 3 days ago | parent | next [-]

I've tried nanonets but it seems very sensitive to the prompt, changing it slightly turned the output to rubbish. When it worked it was pretty good.

deepsquirrelnet 3 days ago | parent [-]

This is true. It’s not meant to be run with any prompt but the one they trained with. I found that out as well. It’s only meant for ocr. Qwen 2.5vl is better if you need that option.

captainregex 3 days ago | parent | prev [-]

I desperately wanted Qwen vl to work but it just unleashes rambling hallucinations off basic screencaps. going to try nanonet!