determinism v nondeterminism is and has never been an issue. also all llms are 100% deterministic, what is non deterministic are the sampling parameters used by the inference engine. which by the way can be easily made 100% deterministic by simply turning off things like batching. this is a matter for cloud based api providers as you as the end user doesnt have acess to the inferance engine, if you run any of your models locally in llama.cpp turning off some server startup flags will get you the deterministic results. cloud based api providers have no choice but keeping batching on as they are serving millions of users and wasting precious vram slots on a single user is wasteful and stupid. see my code and video as evidence if you want to run any local llm 100% deterministocally https://youtu.be/EyE5BrUut2o?t=1

▲

nazgul17 9 hours ago | parent [-]

That's not an interesting difference, from my point of view. The box m black box we all use is non deterministic, period. Doesn't matter where on the inside the system stops being deterministic: if I hit the black box twice, I get two different replies. And that doesn't even matter, which you also said.

The more important property is that, unlike compilers, type checkers, linters, verifiers and tests, the output is unreliable. It comes with no guarantees.

One could be pedantic and argue that bugs affect all of the above. Or that cosmic rays make everything unreliable. Or that people are non deterministic. All true, but the rate of failure, measured in orders of magnitude, is vastly different.

▲

nowittyusername 9 hours ago | parent [-]

My man did you even check my video, did you even try the app. This is not "bug related" nowhere did i say it was a bug. Batch processing is a FEATURE that is intentionally turned on in the inference engine for large scale providers. That does not mean it has to be on. If they turn off batch processing al llm api calls will be 100% deterministic but it will cost them more money to provide the services as now you are stuck with providing 1 api call per GPU. "if I hit the black box twice, I get two different replies" what you are saying here is 100% verifiably wrong. Just because someone chose to turn on a feature in the inference engine to save money does not mean llms are anon deterministic. LLM's are stateless. their weights are froze, you never "run" an LLM, you can only sample it. just like a hologram. and depending on the inference sampling settings you use is what determines the outcome.....

	▲	pegasus 6 hours ago \| parent [-]
		Correct me if I'm wrong, but even with batch processing turned off, they are still only deterministic as long as you set the temperature to zero? Which also has the side-effect of decreasing creativity. But maybe there's a way to pass in a seed for the pseudo-random generator and restore determinism in this case as well. Determinism, in the sense of reproducible. But even if so, "determinism" means more than just mechanical reproducibility for most people - including parent, if you read their comment carefully. What they mean is: in some important way predictable for us humans. I.e. no completely WTF surprises, as LLMs are prone to produce once in a while, regardless of batch processing and temperature settings.