| ▲ | vrighter a day ago | |
Building a static table of seen inputs is just one way of building a state transition table for a markov chain. It is just an implementation detail of a function (in the mathematical sense of the word. no side effects) that takes in some input and outputs a probability distribution. You could make the table bigger and fill in the rest yourself. Or you could use machine learning to do it. But then the table would be too huge to actually store due to combinatorial explosion. So we find a way to reduce that memory cost. How about we don't precompute the whole table and lazily evaluate individual cells as/when they are needed? You achieve that by passing it through the machine-learned function (a trained network is a fixed function with no side effects.) You might say but that's not the same thing!!! But remember that the learned network will always output the same distribution if given the same input, because it is a function. Let's say you have a context size of 1000 and have 10 possible tokens. There are 10^1000 possible inputs. A huge number, but most importantly, a finite number. So you could in theory feed them all in one at a time and record the result in a table. While we can't really do this in practice, because the resulting table would be huge. It is, for all mathematical purposes, equivalent and you could, in theory, freely transform one into the other. Et voila! You have built a markov chain anyway. Previous input goes in, magic happens inside (whichever implementation you used to implement the function doesn't matter), and a probability distribution comes out. It's a markov chain. It doesn't quack and walk like a duck. It IS an actual duck. | ||