Remix.run Logo
astrange a day ago

The difference is there are exponentially more states than an n-gram model. It's really not the same thing at all. An LLM can perform nearly arbitrary computation inside its fixed-size memory.

https://arxiv.org/abs/2106.06981

(An LLM with tool use isn't a Markov process at all of course.)