Remix.run Logo
strix_varius 5 days ago

That isn't how LLMs work.

They are probabilistic. Running them on even different hardware yields different results. And the deltas compound the longer your context and the more tokens you're using (like when writing code).

But more importantly, always selecting the most likely token traps the LLM in loops, reduces overall quality, and is infeasible at scale.

There are reasons that literally no LLM that you use runs deterministically.

vidarh a day ago | parent [-]

With temperature set to zero, they are deterministic if inference is implemented with deterministic calculations.

Only when you turn the temperature up they become probabilistic for a given input in that case. If you take shortcuts in implementing the inference, then sure, rounding errors may accumulate and prevent that, but that is not an issue with the models but with your choice of how to implement the inference.