Remix.run Logo
abtinf 4 hours ago

When processing multiple prompts simultaneously (that is, the typical use case under load), LLMs are nondeterministic, even with a specific seed and zero temperature, due to floating point errors.

See https://news.ycombinator.com/item?id=45200925

kragen an hour ago | parent [-]

This is very interesting, thanks!

> While this hypothesis is not entirely wrong, it doesn’t reveal the full picture. For example, even on a GPU, running the same matrix multiplication on the same data repeatedly will always provide bitwise equal results. We’re definitely using floating-point numbers. And our GPU definitely has a lot of concurrency. Why don’t we see nondeterminism in this test?