Remix clone Hacker News

new | show | ask | jobs Github

	▲	Maxious 3 hours ago
		Yep. These Mixture of Experts models are well suited for paging in only the relevant data for a certain task https://huggingface.co/blog/moe There's some experiments of just removing or merging experts post training to shrink models even more https://bknyaz.github.io/blog/2026/moe/