| ▲ | magicalhippo 8 hours ago | |
> This isn't true From what I understand, in practice it often is true[1]: Matrix multiplication should be “independent” along every element in the batch — neither the other elements in the batch nor how large the batch is should affect the computation results of a specific element in the batch. However, as we can observe empirically, this isn’t true. In other words, the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies! This nondeterminism is not unique to GPUs — LLM inference endpoints served from CPUs or TPUs will also have this source of nondeterminism. [1]: https://thinkingmachines.ai/blog/defeating-nondeterminism-in... | ||
| ▲ | qeternity an hour ago | parent [-] | |
Yes, lots of things can create indeterminism. But nothing is inherent. | ||