| ▲ | danpalmer 2 hours ago | |
I'm pretty sure that the determinism issue is at the floating point math level, or even the hardware level. Just disabling batching and reducing the temperature to 0 does not result in truly deterministic answers. | ||
| ▲ | nnevatie an hour ago | parent | next [-] | |
The FP math is deterministic. However, the environments in which inference is run and specifically batching make current LLM services practically non-deterministic. | ||
| ▲ | orbital-decay 2 hours ago | parent | prev [-] | |
FP math itself is deterministic on real hardware, if the order of operations stays the same. Output reproducibility is much less of a problem than it seems, see for example https://docs.vllm.ai/en/latest/usage/reproducibility/ | ||