Remix.run Logo
vessenes an hour ago

To be clear, temperature 0 is deterministic and will produce the same output for exact duplicate inputs, across all seed choices.

Provided:

* If it’s MoE we are talking about, that the duplicate inputs are for the whole batch (yes, your batch neighbours can impact your choice of experts. Blergh.)

* Your kernels are deterministic

* There’s no system wide effort switch that responds to, e.g. work load across the cluster (for a thinking model)

Upshot:

Temperature 0 is not deterministic in probably any existing cloud infra, but it could be for edge inference pretty reliably.

To your quibble on 0.1 being more deterministic - I think it’s a pretty fair summary - we’re going to sample much more from the ‘temp 0’ answer at 0.1 than we would at temp 0.9, no?

Dylan16807 an hour ago | parent [-]

Even then it's deterministic in the way a hash function is deterministic. Change one letter and you can get a completely different output. What people actually want is something continuous.

guhcampos an hour ago | parent [-]

This is it. People mistake deterministic for precise/exact/correct. It's not.