▲ | andrewflnr 5 days ago | |
> It's very likely that they'll give you a funny surprising answer. Entirely the wrong level of abstraction to apply the concept of "surprise". The actual tokens in the comedian's answer will be surprising in the relevant way. (It's still true that surprising-but-inevitable is very difficult in any form.) | ||
▲ | albertzeyer 5 days ago | parent [-] | |
It's not about the probability of individual tokens. It's about the probability of the whole sequence of tokens, the whole answer. If the model is good (or the human comedian is good), a good funny joke would have a higher probability as the response to the question than a not-so-funny joke. When you use the chain rule of probability to break down the sequence of tokens into probabilities of individual tokens, yes, some of them might have a low probability (and maybe in some frames, there would be other tokens with higher probability). But what counts is the overall probability of the sequence. That's why greedy search is not necessarily the best. A good search algorithm is supposed to find the most likely sequence, e.g. by beam search. (But then, people also do nucleus sampling, which is maybe again a bit counterintuitive...) |