Remix clone Hacker News

new | show | ask | jobs Github

	▲	tgv 6 days ago
		For checking? Just a lookup on disk (no db, just a large list with a custom index, then binary search in the retrieved block). Decoding anything was slow, and in-core was basically out of the question [1]. Caching was important, though, since just a handful of words make up 50% of the text. I once built a spell checker plus corrector which had to run in 32kB under a DOS hotkey, interacting with some word processor. On top of that, it had to run from CD ROM, and respond within a second. I could do 4 lookups, in blocks of 8kB, which gave me the option to look up the word in normal order, in reverse order, and a phonetic transcription in both directions. Each 8kB block contained quite a few words, can't remember how many. Then counting the similarities, and returning them as a sorted list. It wasn't perfect, but worked reasonably well. [1] Adding that for professional spell checking you'd need at least 100k lemmata plus all inflections plus information per word if you have to accept compounds/agglutination.