| ▲ | 317070 5 hours ago | ||||||||||||||||||||||
> so in principle, setting temperature to 0 _should_ result in deterministic outputs It is a common misconception, but it is not true even in principle. If I have 2 or more logits which are equal to the maximum of my logits, I will sample uniformly random from them with any temperature, even zero. Sampling from softmax([1, 0, 1]) is still stochastic at temperature 0, because the limit is to sample uniformly from the first or the last element. Anyway: "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs. GPUs put the associativity of the sums in matrix multiplications in arbitrary order, and this has a huge impact on the logits coming out of the neural network. | |||||||||||||||||||||||
| ▲ | EvgeniyZh 5 hours ago | parent | next [-] | ||||||||||||||||||||||
You don't have to sample uniformly. You could take the lowest index of all maxima. But yeah, the main source of randomness is non-deterministic matmul, and temperature does nothing with it | |||||||||||||||||||||||
| ▲ | jstanley 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
> "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs. But this isn't a fundamental property of LLMs, it's just an implementation detail. It's pretty obvious that if you evaluate the matrix multiplications correctly and deterministically sample from the highest-probability outputs, you will have a deterministic LLM. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | DougBTX 3 hours ago | parent | prev [-] | ||||||||||||||||||||||
> GPUs put the associativity of the sums in matrix multiplications in arbitrary order That’s user-controlled too, not an inherent property of GPUs: https://docs.pytorch.org/docs/2.12/generated/torch.use_deter... | |||||||||||||||||||||||
| |||||||||||||||||||||||