There is nothing intrinsic to LLM prevents reproducibility. You can run them deterministically without adding noise, it would just be a lot slower to have a deterministic order of operations, which takes an already bad idea and makes it worse.

▲

candiddevmike 3 hours ago | parent [-]

Please tell me how to do this with any of the inference providers or a tool like llama.cpp, and make it work across machines/GPUs. I think you could maybe get close to deterministic output, but you'll always risk having some level of randomness in the output.

	▲	cjbgkagh 2 hours ago \| parent \| next [-]
		Just because you can’t do it with your chosen tools it does not mean it cannot be done. I’ve already granted the premise that it is impractical. Unless there is a framework that already guarantees determinism you’ll have to roll your own, which honestly isn’t that hard to do. You won’t get competitive performance but that’s already being sacrificed for determinism so you wouldn’t get that anyway.
	▲	wat10000 11 minutes ago \| parent \| prev [-]
		It's just arithmetic, and computer arithmetic is deterministic. On a practical level, existing implementations are nondeterministic because they don't take care to always perform mathematically commutative operations in the same order every time. Floating-point arithmetic is not commutative, so those variations change the output. It's absolutely possible to fix this and perform the operations in the same order every time, implementors just don't bother. It's not very useful, especially when almost everything runs with a non-zero temperature. I think the whole nondeterminism thing is overblown anyway. Mathematical nondeterminism and practical nondeterminism aren't the same thing. With a compiler, it's not just that identical input produces identical output. It's also that semantically identical input produces semantically identical output. If I add an extra space somewhere whitespace isn't significant in the language I'm using, this should not change the output (aside from debug info that includes column numbers, anyway). My deterministic JSON decoder should not only decode the same values for two runs on identical JSON, a change in one value in the input should produce the same values in the output except for the one that changed. LLMs inherently fail at this regardless of temperature or determinism.