| ▲ | thaumasiotes a day ago | |
> This means that gibberish options for the next token have non-zero probabilities of being chosen. The only reason they don't in reality is because of top-k sampling, temperature, and other filtering that's done on the logits before actually choosing a token. > If you present a s_1, s_2, ... s_N to a Markov Chain when that series was never seen by the chain No, you're confused. The chain has never seen anything. The Markov chain is a table of probability distributions. You can create it by any means you see fit. There is no such thing as a "series" of tokens that has been seen by the chain. | ||
| ▲ | Sohcahtoa82 20 hours ago | parent [-] | |
> The chain has never seen anything. The Markov chain is a table of probability distributions. You can create it by any means you see fit. There is no such thing as a "series" of tokens that has been seen by the chain. When I talk about the chain "seeing" a sequence, I mean that the sequence existed in the material that was used to generate the probability table. My instinct is to believe that you know this, but are being needlessly pedantic. My point is that if you're using a context length of two, if you prompt a Markov Chain with "my cat", but the sequence "my cat was" never appeared in the training material, than a Markov Chain will never choose "was" as the next word. This property is not true for LLMs. If you prompt an LLM with "my cat", then "was" has a non-zero chance of being chosen as the next word, even if "my cat was" never appeared in the training material. | ||