> But like humans — and unlike computer programs — they do not produce the exact same results every time they are used. This is fundamental to the way that LLMs operate: based on the "weights" derived from their training data, they calculate the likelihood of possible next words to output, then randomly select one (in proportion to its likelihood).

This is emphatically not fundamental to LLMs! Yes, the next token is selected randomly; but "randomly" could mean "chosen using an RNG with a fixed seed." Indeed, many APIs used to support a "temperature" parameter that, when set to 0, would result in fully deterministic output. These parameters were slowly removed or made non-functional, though, and the reason has never been entirely clear to me. My current guess is that it is some combination of A) 99% of users don't care, B) perfect determinism would require not just a seeded RNG, but also fixing a bunch of data races that are currently benign, and C) deterministic output might be exploitable in undesirable ways, or lead to bad PR somehow.

▲

pavpanchekha 2 hours ago | parent [-]

Deterministic output is incompatible with batching, which in turn is critical to high utilization on GPUs, which in turn is necessary to keep costs low.

	▲	avaer 27 minutes ago \| parent [-]
		I don't believe it. This seems more like laziness and not caring about the problem, than something fundamental. (FWIW ChatGPT agrees) If commercial LLM providers cared about this (and I think eventually they will, there are many use cases), we'll get seed support. It might be not completely trivial given the complexity of the stacks, but nothing compared to what they've already accomplished. It's just a compute graph, and GPUs are actually extremely INcompatible with randomness if anything. Older LLMs did support seeds, but support got dropped a couple of years ago, I guess when they decided scaling fast was more important than supporting that feature. I highly doubt anyone's kernels/inference pipeline performance is ruined by having an extra i32 or something.