Remix.run Logo
spdustin 10 months ago

My answer to this in my own pet project is to mask terms found by the NER pipeline from being corrected, replacing them with their entity type as a special token (e.g. [male person] or [commercial entity]). That alone dramatically improved grammar/spelling correction, especially because the grammatical "gist" of those masked words is preserved in the text presented to the LLM for "correction".