Remix clone Hacker News

new | show | ask | jobs Github

	▲	boroboro4 5 days ago
		> even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch. I don’t think this is correct - MoE routing happens at per token basis. It can be non deterministic and batch related if you try to balance out your experts load in a batch but that’s performance optimization (just like all of the blogpost) and not the way models are trained to work.
	▲	eldenring 5 days ago \| parent [-]
		Ah interesting, good point. So I guess expert-choice routing leaks across the batch. Now I'm not sure.