Remix clone Hacker News

new | show | ask | jobs Github

	▲	blourvim a day ago
		I am not really an ml dev so I don't understand most of it. It does sound ridiculous how it would even work work. Brilliant work and great article I enjoyed reading it This sounds similar to the Kimi's mixture of experts architecture if I understood it correctly(likely I have not), can you comment on this ?
	▲	dnhkng a day ago \| parent [-]
		No worries, happy to discuss anyway :) MoE (mixture of experts), is an architecture that forces sparsity (not all 'neurons' are active during the forward pass. This is pretty much orthogonal to that; it works with dense and MoE models, by repeating 'vertical' sections of the transformer stack.