| ▲ | sebastiennight 4 days ago | |
> Isn’t back and forth exactly what the new MoE thinking models attempt to simulate? I think the name "Mixture of Experts" might be one of the most misleading labels in our industry. No, that is not at all what MoE models do. Think of it rather like, instead of having one giant black box, we now have multiple smaller opaque boxes of various colors, and somehow (we don't really know how) we're able to tell if your question is "yellow" or "purple" and send that to the purple opaque box to get an answer. The result is that we're able to use less resources to solve any given question (by activating smaller boxes instead of the original huge one). The problem is we don't know in advance which questions are of which color: it's not like one "expert" knows CSS and the other knows car engines. It's just more floating point black magic, so "How do I center a div" and "what's the difference between a V6 and V12" are both "yellow" questions sent to the same box/expert, while "How do I vertically center a div" is a red question, and "what's the most powerful between a V6 and V12" is a green question which activates a completely different set of weights. | ||