The value of figuring out how to make their LLM serving deterministic might help them track this down. There was a recent paper about how the received wisdom that kept assigning it to floating point associativity actually overlooked the real reasons for non-determinism [1].

[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

▲

ants_everywhere 4 days ago | parent | next [-]

network traffic and machine load aren't deterministic. I think for the near term, getting full determinism (e.g. for auditing) is going to only be feasible for batch jobs that are not cost sensitive.

A google search isn't deterministic. Neither is loading upvote count on social media.

It's common advice in distributed systems to have a graceful degradation state instead of becoming unavailable. That wouldn't be possible in a system that's completely deterministic.

	▲	vlovich123 4 days ago \| parent [-]
		Network traffic and machine load don’t usually impact the output of a pure (in the CS sense of purity) math function (which is what an LLM is) unless you’ve written your system to be sensitive to that. > to have a graceful degradation state instead of becoming unavailable. That wouldn't be possible in a system that's completely deterministic. What does this even mean? I see no incompatibility between determinism and your ability to perform the same function more slowly. Determinism just means that the output of the system is solely dependent on the inputs - feed the same inputs get the same outputs. If by degraded state you’re intentionally choosing to change your inputs, that doesn’t change the determinism of your system. When it is said that LLMs aren’t deterministic, it’s because the output token is dependent on the inputs context and all other contexts processed in the same batch because the kernels are written non-deterministically. If the kernels were written deterministically (so that the output only depended on your input context), then there wouldn’t be a problem and it also wouldn’t change the ability for the system to degrade; it would be deterministic because capturing the input context and random seed would be sufficient. As it stands you’d have to capture the interim states of the other inputs being processed in the same batch and that interim state problem is what makes it non deterministic. As for Google search, it’s not clear to me it’s non-deterministic. When you Google the exact same thing twice you get exactly the same page of results and selected snippets. That suggests there’s more determinism in the system than you’re giving it credit for.

▲

mmaunder 4 days ago | parent | prev [-]

It has a big impact on performance to do determinism. Which leaves using another model to essentially IQ test their models with reporting and alerting.