▲ | geor9e a day ago | |||||||
what if you set top_p=1, temperature=0, and always run it on the same local hardware | ||||||||
▲ | mkarrmann a day ago | parent | next [-] | |||||||
Horace He at Thinking Machines just dropped an awesome article describing exactly this: https://thinkingmachines.ai/blog/defeating-nondeterminism-in... TL;DR: assuming you've squashed all regular non-determinism (itself a tall ask), you either need to ensure you always batch requests deterministically, or ensure all kernels are "batch invariant" (which is absolutely not common practice to do). | ||||||||
| ||||||||
▲ | daemonologist a day ago | parent | prev | next [-] | |||||||
Maybe if you run it on CPU. (Maybe on GPU if all batching is disabled, but I wouldn't bet on it.) | ||||||||
▲ | mrheosuper a day ago | parent | prev [-] | |||||||
cosmic wave will get you |