Remix.run Logo
Someone 6 days ago

For the basic word list, possibly tries (https://en.wikipedia.org/wiki/Trie), DAGs (https://en.wikipedia.org/wiki/Directed_acyclic_graph#Data_co...), or Bloom filter (https://en.wikipedia.org/wiki/Bloom_filter)

Aurornis 6 days ago | parent | next [-]

The article is about fitting large dictionaries into small memory footprints. Writing a 200K word spell checker on a machine with only 256K memory.

When you need to store your dictionary in under 1 byte per word, a trie won't cut it.

bazzargh 6 days ago | parent | prev | next [-]

The limit given in the article is 360KB (on floppy). At that size, you can't use Tries, you need lossy compression. A Bloom filter can get you 1 in 359 false positives with the size of word list given https://hur.st/bloomfilter/?n=234936&p=&m=360KB&k=

The error rate goes up to 1 in 66 for 256KB (in memory only);

tetraodonpuffer 6 days ago | parent | prev [-]

according to https://en.wikipedia.org/wiki/Ispell ispell (1971) already used Levenshtein Distance (although from the article it is not stated if this already existed in the original version, or if it was added in later years).

Someone 5 days ago | parent [-]

Levenshtein distance up to 1, according to that article. If you have a hierarchical structure (trie or a DAG; in some sense, a DAG is a trie, but stored more efficiently, with the disadvantage that adding or removing words is hard) with valid words, it is not hard to check what words satisfy that. If you only do the inexact search after looking for the exact word and finding it missing I think it also won’t be too slow when given ‘normal’ text to spell-check.