| ▲ | vessenes an hour ago | |||||||
To be clear, temperature 0 is deterministic and will produce the same output for exact duplicate inputs, across all seed choices. Provided: * If it’s MoE we are talking about, that the duplicate inputs are for the whole batch (yes, your batch neighbours can impact your choice of experts. Blergh.) * Your kernels are deterministic * There’s no system wide effort switch that responds to, e.g. work load across the cluster (for a thinking model) Upshot: Temperature 0 is not deterministic in probably any existing cloud infra, but it could be for edge inference pretty reliably. To your quibble on 0.1 being more deterministic - I think it’s a pretty fair summary - we’re going to sample much more from the ‘temp 0’ answer at 0.1 than we would at temp 0.9, no? | ||||||||
| ▲ | Dylan16807 an hour ago | parent [-] | |||||||
Even then it's deterministic in the way a hash function is deterministic. Change one letter and you can get a completely different output. What people actually want is something continuous. | ||||||||
| ||||||||