Remix.run Logo
thesz a day ago

  > A markov chain model will literally have a matrix entry for every possible combination of inputs.
The less frequent prefixes are usually pruned away and there is a penalty score to add to go to the shorter prefix. In the end, all words are included into the model's prediction and typical n-gram SRILM model is able to generate "the pig with dragon head," also with small probability.

Even if you think about Markov Chain information as a tensor (not matrix), the computation of probabilities is not a single lookup, but a series of folds.