| ▲ | sigmoid10 4 hours ago | ||||||||||||||||
>in theory theory, temperature 0 doesn't really exist. It does exist very much, even if you go to pure math. Look at the softmax function and take the limit as T->0. It becomes a dirac-delta function. I.e. in a discrete setting (like for LLMs with a finite set of output tokens), probability P becomes one for argmax and 0 for everything else. Only in coding practice it is easer to implement T=0 as a simple if check that directly chooses argmax instead of calculating the limit of some function that includes 1/T quotients. But setting T to zero is in both, theory and practice, turning the usual probability function into greedy sampling. | |||||||||||||||||
| ▲ | 317070 an hour ago | parent | next [-] | ||||||||||||||||
> Look at the softmax function and take the limit as T->0. It becomes a dirac-delta function. In pure math, it does not always do that. It becomes a dirac-delta comb with equal weight on every maximum. There can be more than 1 maximum. Setting the temperature to zero turns into greedy sampling, but greedy sampling is not necessarily deterministic as you can have multiple equally optimal options. | |||||||||||||||||
| |||||||||||||||||
| ▲ | thaumasiotes an hour ago | parent | prev [-] | ||||||||||||||||
> It becomes a dirac-delta function. I.e. in a discrete setting (like for LLMs with a finite set of output tokens), probability P becomes one for argmax and 0 for everything else. Only in coding practice it is easer to implement T=0 as a simple if check that directly chooses argmax instead of calculating the limit of some function that includes 1/T quotients. I don't understand the distinction you're drawing. A Dirac delta function is a "simple if check". | |||||||||||||||||
| |||||||||||||||||