| ▲ | D-Machine 2 hours ago | |
> With a temperature of zero, LLM output will always be the same Ignoring GPU indeterminism, if you are running a local LLM and control batching, yes. If you are computing via API / on the cloud, and so being batched with other computations, then no (https://thinkingmachines.ai/blog/defeating-nondeterminism-in...). But, yes, there is a lot of potential from semantic compression via AI models here, if we just make the efforts. | ||