▲ | eldenring 5 days ago | |||||||
Very impressive! I guess this still wouldn't affect their original example > For example, you might observe that asking ChatGPT the same question multiple times provides different results. even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch. > Not because we’re somehow leaking information across batches — instead, it’s because our forward pass lacks “batch invariance”, causing our request’s output to depend on the batch size of our forward pass. The router also leaks batch-level information across sequences. | ||||||||
▲ | boroboro4 5 days ago | parent [-] | |||||||
> even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch. I don’t think this is correct - MoE routing happens at per token basis. It can be non deterministic and batch related if you try to balance out your experts load in a batch but that’s performance optimization (just like all of the blogpost) and not the way models are trained to work. | ||||||||
|