Remix.run Logo
astrange 2 days ago

An LLM is not a Markov chain of the input tokens, because it has internal computational state (the KV cache and residuals).

An LLM is a Markov process if you include its entire state, but that's a pretty degenerate definition.

Jensson a day ago | parent [-]

> An LLM is a Markov process if you include its entire state, but that's a pretty degenerate definition.

Not any more degenerate than a multi word bag of words markov chain, its exactly the same concept: you input a context of words / tokens and get a new word / token, the things you mention there are just optimizations around that abstraction.

astrange a day ago | parent [-]

The difference is there are exponentially more states than an n-gram model. It's really not the same thing at all. An LLM can perform nearly arbitrary computation inside its fixed-size memory.

https://arxiv.org/abs/2106.06981

(An LLM with tool use isn't a Markov process at all of course.)