new | show | ask | jobs Github

lrvick 5 days ago

Job one is have every bit of software involved also be deterministic, which stagex takes care of.

I had no problem getting deterministic LLM outputs when I experimented with this 6 months ago.

Run two of these with the same prompts and same seed and you get the same results.

Obviously in GPU clusters with different hardware things get more complicated.

https://git.distrust.co/public/llmshell

▲

spindump8930 5 days ago | parent | next [-]

That's not what this is about.

"I had no problem getting deterministic LLM outputs when I experimented with this 6 months ago" looks like you're using llama-cpp in that repo. This is about vllm serving many requests at once, at long sequence lengths.

> As it turns out, our request’s output does depend on the parallel user requests. Not because we’re somehow leaking information across batches — instead, it’s because our forward pass lacks “batch invariance”, causing our request’s output to depend on the batch size of our forward pass.

Your situation isn't really comparable.

▲

saagarjha 5 days ago | parent | prev [-]

What’s stagex?

	▲	lrvick 5 days ago \| parent [-]
		supply chain security focused linux distro that does not trust its own maintainers by design.