▲ | abtinf 4 hours ago | |
When processing multiple prompts simultaneously (that is, the typical use case under load), LLMs are nondeterministic, even with a specific seed and zero temperature, due to floating point errors. | ||
▲ | kragen an hour ago | parent [-] | |
This is very interesting, thanks! > While this hypothesis is not entirely wrong, it doesn’t reveal the full picture. For example, even on a GPU, running the same matrix multiplication on the same data repeatedly will always provide bitwise equal results. We’re definitely using floating-point numbers. And our GPU definitely has a lot of concurrency. Why don’t we see nondeterminism in this test? |