Remix.run Logo
rdos 5 hours ago

Is it possible for such a small model to outperform gemini 3 or is this a case of benchmarks not showing the reality? I would love to be hopeful, but so far an open source model was never better than a closed one even when benchmarks were showing that.

amluto 5 hours ago | parent | next [-]

Off the top of my head: for a lot of OCR tasks, it’s kind of worse for the model to be smart. I don’t want my OCR to make stuff up or answer questions — I want to to recognize what is actually on the page.

rdos 4 hours ago | parent [-]

Interesting. Won't stuff like entity extraction suffer? Especially in multilingual use cases. My worry is that a smaller model might not realize some text is actually a persons name because it is very unusual.

kergonath 3 hours ago | parent [-]

The model does not need to be that smart to understand that a name it does not know that starts with a capital letter is a the name of a place or a person. It does not need to be aware of whom this refers to, it just needs to transcribe it.

Also, there are generalist models that have enough of a grasp of a dozen or so languages that fit comfortably in 7B parameters. Like the older Mistral, which had the best multi-lingual support at the time, but newer models around that size are probably good candidates. I am not surprised that a multilingual specialised model can fit in 8B or so.

woeirua 3 hours ago | parent | prev [-]

No. Gemini is clearly the leader across the board: https://www.ocrarena.ai/leaderboard