▲ | measurablefunc 6 days ago | |||||||
Where is the logical mistake in the linked argument? If there is a mistake then I'd like to know what it is & the counter-example that invalidates the logical argument. | ||||||||
▲ | versteegen 6 days ago | parent | next [-] | |||||||
A Transformer with a length n context window implements an order 2n-1 Markov chain¹. That is correct. That is also irrelevant in the real world, because LLMs aren't run for that many tokens (as results are bad). Before it hits that limit, there is nothing requiring it to have any of the properties of a Markov chain. In fact, because the state space is k^n (alphabet size k), you might not revisit a state until generating k^n tokens. ¹ Depending on context window implementation details, but that is the maximum, because the states n tokens back were computed from the n tokens before that. The minimum of course is an order n-1 Markov chain. | ||||||||
| ||||||||
▲ | 6 days ago | parent | prev | next [-] | |||||||
[deleted] | ||||||||
▲ | 6 days ago | parent | prev [-] | |||||||
[deleted] |