| ▲ | dragonwriter 2 days ago | ||||||||||||||||
I think, strictly speaking, autoregressive LLMs are markov chains of a very high order. The trick (aside from the order) is the training process by which they are derived from their source data. Simply enumerating the states and transitions in the source data and the probability of each transition from each state in the source doesn’t get you an LLM. | |||||||||||||||||
| ▲ | krackers a day ago | parent | next [-] | ||||||||||||||||
I always like to think LLMs are markov models in the way that real-world computers are finite state machines. It's technically true, but not a useful abstraction at which to analyze them. Both LLMs and n-gram models satisfy the markov property, and you could in principle go through and compute explicit transition matrices (something on the size of vocab_size*context_size I think). But LLMs aren't trained as n-gram models, so besides giving you autoregressive-ness, there's not really much you can learn by viewing it as a markov model | |||||||||||||||||
| |||||||||||||||||
| ▲ | JPLeRouzic a day ago | parent | prev [-] | ||||||||||||||||
Yes I agree, my code includes a good tokenizer, not a simple word splitter. | |||||||||||||||||