Remix.run Logo
smokel 3 hours ago

For the uninitiated: Interestingly, it is not advisable to take this to the extreme and set temperature to 0.

That would seem logical, as the results are then completely deterministic, but it turns out that a suboptimal token may result in a better answer in the long run. Also, allowing for a little bit of noise gives the model room to talk itself out of a suboptimal path.

LoganDark 3 hours ago | parent [-]

I like to think of this like tempering the output space. With a temperature of zero, there is only one possible output and it may be completely wrong. With even a low temperature, you drastically increase the chances that the output space contains a correct answer, through containing multiple responses rather than only one.

I wonder if determinism will be less harmful to diffusion models because they perform multiple iterations over the response rather than having only a single shot at each position that lacks lookahead. I'm looking forward to finding out and have been playing with a diffusion model locally for a few days.

reactordev 2 hours ago | parent [-]

Yup. I think of it as how off the rails do you want to explore?

For creative things or exploratory reasoning, a temperature of 0.8 lends us to all sorts of excursions down the rabbit hole. However, when coding and needing something precise, a temperature of 0.2 is what I use. If I don’t like the output, I’ll rephrase or add context.