Remix.run Logo
pzo 3 days ago

there has been so many open source OCR in the last 3 months that would be good to compare to those especially when some are not even 1B params and can be run on edge devices.

- paddleOCR-VL

- olmOCR-2

- chandra

- dots.ocr

I kind of miss there is not many leaderboard sections or arena for OCR and CV and providers hosting those. Neglected on both Artificial Analysis and OpenRouter.

culi 2 days ago | parent | next [-]

Someone posted a project here about a month ago where they compare models in head-to-head matchups similar to llmarena

https://www.ocrarena.ai/leaderboard

Hasn't been updated for Mistral but so far gemeni seems to top the leaderboard.

jeffbee 2 days ago | parent | next [-]

OCR developers from decades past must be slapping their foreheads now that it seems users will wait a whole minute per page and be happy.

delaminator 2 days ago | parent | next [-]

What they are happy about is accurate OCR.

Getting the wrong answer really quickly is not the best goal.

culi 2 days ago | parent | prev [-]

You can also sort by latency. dots.ocr has the lowest at 3.8s/page. And although it doesn't fare very well against much larger slower models, it's still streets ahead of traditional OCR techniques

andai 2 days ago | parent | prev | next [-]

How can something have a very high ELO but a very low win rate?

BlackLotus89 2 days ago | parent [-]

You don't loose any elo if your opponent is much stronger than you. Remis could in theory play a part as well.

pplonski86 2 days ago | parent | prev [-]

very nice comparison! I'd like to see on what examples OCR engines fail

pzo 3 days ago | parent | prev | next [-]

what I like in MistralOCR is that they have simple pricing $1/1k pages and API hosted on their servers. With other OCR is hard to compare pricing because are token based and you don't know how many tokens is the image unless you run your own test.

E.g. with Gemini 3.0 flash you might seem that model pricing increased only slightly comparing to Gemini 2.5 flash until you test it and will see that what used to be 258 per 384x384 input tokens now is around 3x more.

gunalx 2 days ago | parent | next [-]

But they doubled the price g for this new mistralocr3 model to 2$

amelius 2 days ago | parent | prev [-]

Simple would be to bill per character.

Now I have to figure out how large a page can be.

andai 2 days ago | parent | prev | next [-]

I spent like three hours trying to get one of these running and then gave up. I think the paddleOCR one.

It took an hour and a half to install 12 gigabytes of pytorch dependencies that can't even run on my device, and then it told me it had some sort of versioning conflict. (I think I was supposed to use UV, but I had run out of steam by that point.)

Maybe I should have asked Claude to install it for me. I gave Claude root on a $3 VPS, and it seems to enjoy the sysadmin stuff a lot more than I do...

Incidentally I had a similar experience installing open web UI... It installed 12 GB of pytorch crap.. I rage quit and deleted the whole thing, and replicated the functionality I actually needed in 100 lines of HTML.... Too bad I can't do that with OCR ;)

CamperBob2 2 days ago | parent [-]

gemini-cli is good for this sort of thing. You can just tell it "Find out why xyz.py doesn't run" and let it crunch. It will try reasonably hard to get you out of Python dependency hell, and (more important) it generally knows when to give up.

But yes, in general, you want to use uv. Otherwise, the next Python application you install WILL break the last one you installed.

I suppose you could use gemini-cli as a substitute for proper Python virtual environment management, always letting it fix whatever broke since the last time you tried to run the program, but that'd be like burning down a rainforest to toast a marshmallow.

andai 2 days ago | parent [-]

Actually, I just remembered, this was inside uv!

hereme888 3 days ago | parent | prev | next [-]

https://www.codesota.com/ocr

jammo 2 days ago | parent | prev [-]

[dead]