| ▲ | astrange 2 days ago | |||||||
An LLM is not a Markov chain of the input tokens, because it has internal computational state (the KV cache and residuals). An LLM is a Markov process if you include its entire state, but that's a pretty degenerate definition. | ||||||||
| ▲ | Jensson a day ago | parent [-] | |||||||
> An LLM is a Markov process if you include its entire state, but that's a pretty degenerate definition. Not any more degenerate than a multi word bag of words markov chain, its exactly the same concept: you input a context of words / tokens and get a new word / token, the things you mention there are just optimizations around that abstraction. | ||||||||
| ||||||||