| ▲ | philipodonnell a day ago | |
Is the difficulty that in high entropy situations, you can’t really tell whether it’s because the model is uncertain, or because of the options are so semantically similar that it doesn’t matter which one you choose? Like pure synonyms. | ||
| ▲ | scottmu a day ago | parent [-] | |
If 2 (or more) tokens are synonymous with each other with high probabilities (49.9% each for a total of 99.8%), that's still low entropy. Not as low as a singular high-probability token, but low enough for us to consider this a low-entropy token distribution. You can't look at a single token distribution, though. There are many legitimate high-confidence, high-accuracy cases in which many tokens could come next. For example, the first token of a paragraph. You need to look at pools of entropies over segments of the output or the whole output sequence. Although there's a correlation between uncertainty and hallucinations or inaccuracies, there's no guarantee. This is a challenging area that we're monitoring the latest literature for and contributing where we can. | ||