▲ | AlphaAndOmega0 6 days ago | ||||||||||||||||||||||||||||
I'd have liked more explanation of the actual solutions that programmers used at the time. | |||||||||||||||||||||||||||||
▲ | tgv 6 days ago | parent | next [-] | ||||||||||||||||||||||||||||
For checking? Just a lookup on disk (no db, just a large list with a custom index, then binary search in the retrieved block). Decoding anything was slow, and in-core was basically out of the question [1]. Caching was important, though, since just a handful of words make up 50% of the text. I once built a spell checker plus corrector which had to run in 32kB under a DOS hotkey, interacting with some word processor. On top of that, it had to run from CD ROM, and respond within a second. I could do 4 lookups, in blocks of 8kB, which gave me the option to look up the word in normal order, in reverse order, and a phonetic transcription in both directions. Each 8kB block contained quite a few words, can't remember how many. Then counting the similarities, and returning them as a sorted list. It wasn't perfect, but worked reasonably well. [1] Adding that for professional spell checking you'd need at least 100k lemmata plus all inflections plus information per word if you have to accept compounds/agglutination. | |||||||||||||||||||||||||||||
▲ | Someone 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
For the basic word list, possibly tries (https://en.wikipedia.org/wiki/Trie), DAGs (https://en.wikipedia.org/wiki/Directed_acyclic_graph#Data_co...), or Bloom filter (https://en.wikipedia.org/wiki/Bloom_filter) | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | ksherlock 6 days ago | parent | prev [-] | ||||||||||||||||||||||||||||
https://blog.codingconfessions.com/p/how-unix-spell-ran-in-6... | |||||||||||||||||||||||||||||
|