| ▲ | pbhjpbhj 3 hours ago | |
You almost don't want [super-]word level ML (ie word-pair/phrase/sentence/document/corpus level). In transcription, you want near certainty, or you want marking that the word could not be read with certainty - yes, context lets you guess, but you want - for some OCR - to know when it's a guess based on other than the letters in order forming a word. Example, in a census document on familysearch.com the transcriber "corrected" a name as Joseph. The literal letters in the handwritten document spell Josepth ... and sure enough that's a local variant spelling (Eire). In another document the writer has used "Joh" as an abbreviation, a [human, I assume] transcriber put that as John ... which is most likely, but happens to be wrong. Sometimes you care that it's guessed, sometimes you want just the best guess. | ||
| ▲ | messe 2 hours ago | parent [-] | |
> Eire A nitpick, because it's often a dogwhistle: but almost nobody in Ireland calls it that when speaking English. And that's still incorrect in Irish, the correct spelling is Éire. | ||