Remix.run Logo
efitz 6 days ago

Having a dictionary is a prerequisite but is only a small part of the spell check problem. Plus, plain text word lists are slow to parse in the 80s; better going with a Trie or some other exotic tree structure that is naturally compressed but O(log(n)) instead of O(n) to traverse.

The computer has to figure out whether the word is in the dictionary, but it also has to figure out a suggestion for what to change it to.

And even after just that, we already have a bug- homonym mistakes- homonyms are in the dictionary but they’re misspelled (that was intentional btw).

How misspelled is another problem. We’ve had Levenshtein et al algorithms for a long time, but how different can you get? A really badly misspelled word might not have any good replacement candidates within your edit distance limit.

There are also optimizations like frequently mistyped words (acn-> can), acronyms, etc.

It was never just about size.