Remix.run Logo
Der_Einzige 5 hours ago

This belief (LLMs are deterministic except for samplers) is very wrong and will get you into hilariously large amounts of trouble for assuming it's true.

Also greedy sampling considered harmful: https://arxiv.org/abs/2506.09501

From the abstract:

"For instance, under bfloat16 precision with greedy decoding, a reasoning model like DeepSeek-R1-Distill-Qwen-7B can exhibit up to 9% variation in accuracy and 9,000 tokens difference in response length due to differences in GPU count, type, and evaluation batch size. We trace the root cause of this variability to the non-associative nature of floating-point arithmetic under limited numerical precision. This work presents the first systematic investigation into how numerical precision affects reproducibility in LLM inference. Through carefully controlled experiments across various hardware, software, and precision settings, we quantify when and how model outputs diverge. Our analysis reveals that floating-point precision—while critical for reproducibility—is often neglected in evaluation practices."

marcinzm an hour ago | parent | next [-]

Does this apply to TPUs or just GPUs?

sgt101 2 hours ago | parent | prev [-]

Great reference - thanks.