| ▲ | astrange a day ago | |
The difference is there are exponentially more states than an n-gram model. It's really not the same thing at all. An LLM can perform nearly arbitrary computation inside its fixed-size memory. https://arxiv.org/abs/2106.06981 (An LLM with tool use isn't a Markov process at all of course.) | ||