|
| ▲ | NitpickLawyer 5 days ago | parent | next [-] |
| It gets more complicated with things like batch processing. Depending on where in the stack your query gets placed, and how the underlying hardware works, and how the software stack was implemented, you might get small differences that get compounded over many token generations. (vLLM - a popular inference engine, has this problem as well). |
|
| ▲ | danpalmer 5 days ago | parent | prev | next [-] |
| Not necessarily. This is a good blog post from a few days about it: https://thinkingmachines.ai/blog/defeating-nondeterminism-in... |
| |
|
| ▲ | bschwindHN 5 days ago | parent | prev | next [-] |
| Previous discussion: https://news.ycombinator.com/item?id=19567011 And a quora link (sorry): https://www.quora.com/If-floating-point-addition-isnt-associ... |
|
| ▲ | HDThoreaun 5 days ago | parent | prev [-] |
| Associative property of multiplication breaks down with floating point math because of the error. If the engine is multithreaded then its pretty easy to see how ordering of multiplication can change which can change the output. |