| ▲ | shenberg a day ago | |
The short and unsatisfying answer is that an LLM generation is a markov chain, except that instead of counting n-grams in order to generate the posterior distribution, the training process compresses the statistics into the LLM's weights. There was an interesting paper a while back which investigated using unbounded n-gram models as a complement to LLMs: https://arxiv.org/pdf/2401.17377 (I found the implementation to be clever and I'm somewhat surprised it received so little follow-up work) | ||