Remix.run Logo
pavpanchekha 2 hours ago

Deterministic output is incompatible with batching, which in turn is critical to high utilization on GPUs, which in turn is necessary to keep costs low.

avaer 27 minutes ago | parent [-]

I don't believe it. This seems more like laziness and not caring about the problem, than something fundamental. (FWIW ChatGPT agrees)

If commercial LLM providers cared about this (and I think eventually they will, there are many use cases), we'll get seed support. It might be not completely trivial given the complexity of the stacks, but nothing compared to what they've already accomplished. It's just a compute graph, and GPUs are actually extremely INcompatible with randomness if anything.

Older LLMs did support seeds, but support got dropped a couple of years ago, I guess when they decided scaling fast was more important than supporting that feature.

I highly doubt anyone's kernels/inference pipeline performance is ruined by having an extra i32 or something.