Remix clone Hacker News

new | show | ask | jobs Github

	▲	danpalmer 2 hours ago
		I'm pretty sure that the determinism issue is at the floating point math level, or even the hardware level. Just disabling batching and reducing the temperature to 0 does not result in truly deterministic answers.
	▲	nnevatie an hour ago \| parent \| next [-]
		The FP math is deterministic. However, the environments in which inference is run and specifically batching make current LLM services practically non-deterministic.
	▲	orbital-decay 2 hours ago \| parent \| prev [-]
		FP math itself is deterministic on real hardware, if the order of operations stays the same. Output reproducibility is much less of a problem than it seems, see for example https://docs.vllm.ai/en/latest/usage/reproducibility/