Min_p author here: I’m convinced that the whole field critically misunderstands temperature (I.e temperatures limited to 2 is very harmful for diverse generation). Articles like this are excellent and very cool.

Hacking your LLM inference engine to enable cool sampling tricks is the definition of AI research/engineering. We need more of this and less prompt grifting.

▲

wolttam 4 hours ago | parent | next [-]

Okay, something just tweaked in my brain. Do higher temperatures essentially unlock additional paths for a model to go down when solving a particular problem? Therefore, for some particularly tricky problems, you could perform many evaluations at a high temperature in hopes that the model happens to take the correct approach in one of those evaluations.

Edit: What seems to break is how high temperature /continuously/ acts to make the model's output less stable. It seems like it could be useful to use a high temperature until it's evident the model has started a new approach, and then start sampling at a lower temperature from there.

	▲	wongarsu 3 hours ago \| parent [-]
		Decaying temperature might be a good approach. Generate the first token at a high temperature (like 20), then for each next token multiply temperature by 0.9 (or some other scaling factor) until you reach your steady-state target temperature

▲

bjourne 6 hours ago | parent | prev [-]

Correct me if I'm wrong, but the problem is that it is almost impossible to evaluate sampling methods. You can't just look at perplexity and conclude that A is better than B. So you need large-scale expensive human evaluations. Even if you have those it is difficult to extrapolate results since what sampling method works best depends on the dataset(s).