| ▲ | bthornbury 2 hours ago | |
Something like a perplexity/log-likelihood measurement across a large enough number of prompts/tokens might get you the same in a statistical sense though. I expect those comparison percentages at the top are something like that. | ||