Remix clone Hacker News

new | show | ask | jobs Github

	▲	thesz 6 hours ago
		https://en.wikipedia.org/wiki/Mixture_of_experts#Sparsely-ga... "The sparsely-gated MoE layer,[21] published by researchers from Google Brain, uses feedforward networks as experts, and linear-softmax gating. Similar to the previously proposed hard MoE, they achieve sparsity by a weighted sum of only the top-k experts, instead of the weighted sum of all of them." "Top-k experts," in case of some DeepSeek's models k=1.