Reducing temperature to 0 doesn't make LLMs deterministic. There's still a bunch of other issues such as float math results depending on which order you perform mathematically commutative operations in.

▲

riffraff 5 days ago | parent [-]

I keep reading this but I don't get it: for the same input shouldn't the order of resulting operations be deterministic too?

▲

NitpickLawyer 5 days ago | parent | next [-]

It gets more complicated with things like batch processing. Depending on where in the stack your query gets placed, and how the underlying hardware works, and how the software stack was implemented, you might get small differences that get compounded over many token generations. (vLLM - a popular inference engine, has this problem as well).

▲

danpalmer 5 days ago | parent | prev | next [-]

Not necessarily. This is a good blog post from a few days about it: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

	▲	riffraff 4 days ago \| parent [-]
		Fantastic article, thanks!

▲

bschwindHN 5 days ago | parent | prev | next [-]

Previous discussion:

https://news.ycombinator.com/item?id=19567011

And a quora link (sorry):

https://www.quora.com/If-floating-point-addition-isnt-associ...

▲

HDThoreaun 5 days ago | parent | prev [-]

Associative property of multiplication breaks down with floating point math because of the error. If the engine is multithreaded then its pretty easy to see how ordering of multiplication can change which can change the output.