Remix.run Logo
monkeywork 4 days ago

do you have an automated method of doing the filtering or is this all manual

yorwba 4 days ago | parent [-]

The sorting is automated.

  word_count = Counter(w for s in sentences for w in words(s))

  sentences_by_word = defaultdict(list)
  for s in sentences:
    for w in words(s):
      sentences_by_word[w].append(s)

  sentence_sort_key = lambda s: sorted(word_count[w] for w in set(words(s)))

  for w, _ in word_count.most_common():
    candidates = sorted(sentences_by_word[w], key=sentence_sort_key, reverse=True)[:5]
    for c in candidates:
      print(w, ':', c)
    input()
(Add epicycles for defining what a word is, what a sentence is, ensure the candidate sentences have varying lengths, keep track of which words and sentences were already seen...)

The final step of choosing one sentence and turning it into an Anki flashcard is manual.