| ▲ | smokel 3 hours ago | |||||||
For the uninitiated: Interestingly, it is not advisable to take this to the extreme and set temperature to 0. That would seem logical, as the results are then completely deterministic, but it turns out that a suboptimal token may result in a better answer in the long run. Also, allowing for a little bit of noise gives the model room to talk itself out of a suboptimal path. | ||||||||
| ▲ | LoganDark 3 hours ago | parent [-] | |||||||
I like to think of this like tempering the output space. With a temperature of zero, there is only one possible output and it may be completely wrong. With even a low temperature, you drastically increase the chances that the output space contains a correct answer, through containing multiple responses rather than only one. I wonder if determinism will be less harmful to diffusion models because they perform multiple iterations over the response rather than having only a single shot at each position that lacks lookahead. I'm looking forward to finding out and have been playing with a diffusion model locally for a few days. | ||||||||
| ||||||||