| ▲ | -_- 3 hours ago | |
Author here! 1a. LLMs fundamentally model probability distributions of token sequences—those are the (normalized) logits from the last linear layer of a transformer. The closest thing to ablating temperature is T=0 or T=1 sampling. 1b. Yes, you can do something like this, for instance by picking the temperature where perplexity is minimized. Perplexity is the exponential of entropy, to continue the thermodynamic analogy. 1c. Higher than for most AI written text, around 1.7. I've experimented with this as a metric for distinguishing whether text is written by AI. Human-written text doesn't follow a constant-temperature softmax distribution, either. 2b. Giving an LLM control over its own sampling parameters sounds like it would be a fun experiment! It could have dynamic control to write more creatively or avoid making simple mistakes. 2c. This would produce nonsense. The tokens you get with negative temperature sampling are "worse than random" | ||