| ▲ | adrian_b 3 hours ago | |||||||
You are right that as commonly implemented, the evaluation of an LLM may be non deterministic even when explicit randomization is eliminated, due to various race conditions in a concurrent evaluation. However, if you evaluate carefully the LLM core function, i.e. in a fixed order, you will obtain perfectly deterministic results (except on some consumer GPUs, where, due to memory overclocking, memory errors are frequent, which causes slightly erroneous results with non-deterministic errors). So if you want deterministic LLM results, you must audit the programs that you are using and eliminate the causes of non-determinism, and you must use good hardware. This may require some work, but it can be done, similarly to the work that must be done if you want to deterministically build a software package, instead of obtaining different executable files at each recompilation from the same sources. | ||||||||
| ▲ | pixl97 an hour ago | parent | next [-] | |||||||
If you want a deterministic LLM, just build 'Plain old software'. | ||||||||
| ▲ | KeplerBoy 3 hours ago | parent | prev | next [-] | |||||||
It's not even hard, just slow. You could do that on a single cheap server (compared to a rack full of GPUs). Run a CPU llm inference engine and limit it to a single thread. | ||||||||
| ▲ | usernametaken29 3 hours ago | parent | prev [-] | |||||||
Only that one is built to be deterministic and one is built to be probabilistic. Sure, you can technically force determinism but it is going to be very hard. Even just making sure your GPU is indeed doing what it should be doing is going to be hard. Much like debugging a CPU, but again, one is built for determinism and one is built for concurrency. | ||||||||
| ||||||||