Remix.run Logo
pmarreck 10 days ago

Downloading the MLX version of "Qwen2.5-VL-32b-Instruct -8bit" via LM Studio right now since it's not yet available on Ollama and I can run it locally... I have an OCR side project for it to work on, want to see how performant it is on my M4... will report back

hdjjhhvvhga 9 days ago | parent [-]

I'm very curious about the results - I've been using mistral-ocr for the last 2 weeks and it worked really well.

pmarreck 9 days ago | parent [-]

Its errors are interesting (averaging around one per paragraph). Semantically-correct, but wrong on precision (simple example, the English word "ardour" is transcripted as "ardor", and a foreign word like "palazzo" which is intended to remain so, is translated to "palace"). I'm still messing with temp/presence/frequency/top-p/top-k/prompting to see if I can squeeze some more precision out of it, but I'm running out of time.

Not sure if it matters but I exported a PDF page as a PNG with 200dpi resolution, and used that.

It seems like it's reading the text but getting the details wrong.

I would not be comfortable using this in an official capacity without more accuracy. I could see using this for words that another OCR system is uncertain about, though, as a fallback.