| ▲ | nemo1618 2 hours ago | |||||||
> But like humans — and unlike computer programs — they do not produce the exact same results every time they are used. This is fundamental to the way that LLMs operate: based on the "weights" derived from their training data, they calculate the likelihood of possible next words to output, then randomly select one (in proportion to its likelihood). This is emphatically not fundamental to LLMs! Yes, the next token is selected randomly; but "randomly" could mean "chosen using an RNG with a fixed seed." Indeed, many APIs used to support a "temperature" parameter that, when set to 0, would result in fully deterministic output. These parameters were slowly removed or made non-functional, though, and the reason has never been entirely clear to me. My current guess is that it is some combination of A) 99% of users don't care, B) perfect determinism would require not just a seeded RNG, but also fixing a bunch of data races that are currently benign, and C) deterministic output might be exploitable in undesirable ways, or lead to bad PR somehow. | ||||||||
| ▲ | pavpanchekha 2 hours ago | parent [-] | |||||||
Deterministic output is incompatible with batching, which in turn is critical to high utilization on GPUs, which in turn is necessary to keep costs low. | ||||||||
| ||||||||