Remix.run Logo
amluto 5 hours ago

Off the top of my head: for a lot of OCR tasks, it’s kind of worse for the model to be smart. I don’t want my OCR to make stuff up or answer questions — I want to to recognize what is actually on the page.

rdos 4 hours ago | parent [-]

Interesting. Won't stuff like entity extraction suffer? Especially in multilingual use cases. My worry is that a smaller model might not realize some text is actually a persons name because it is very unusual.

kergonath 3 hours ago | parent [-]

The model does not need to be that smart to understand that a name it does not know that starts with a capital letter is a the name of a place or a person. It does not need to be aware of whom this refers to, it just needs to transcribe it.

Also, there are generalist models that have enough of a grasp of a dozen or so languages that fit comfortably in 7B parameters. Like the older Mistral, which had the best multi-lingual support at the time, but newer models around that size are probably good candidates. I am not surprised that a multilingual specialised model can fit in 8B or so.