Remix clone Hacker News

new | show | ask | jobs Github

	▲	zozbot234 3 hours ago
		MoE has nothing whatsoever to do with specialized task solvers. It always operates per token within a single task, you can think of it perhaps as a kind of learned "attention" for model parameters as opposed to context data.
	▲	XCSme 2 hours ago \| parent [-]
		Yes, specific weights/parameters have be trained to solve specific tasks (trained on different data). Or did I misunderstand the concept of MoE, and it's not about having specific parts of the model (parameters) do better on specific input contexts?