| ▲ | AntiUSAbah a day ago | |
A MoE model is one model with expert parts which use less tokens. Which makes it easier for an expert to diverge to a better optimum state. Its easier to only need to know medicin instead of everything and being able to separate everything away from medicin even if certain names, concepts etc. are the same. AI agents discussing things with each others would be more like one thinking model thinking throught the problem with different personas. With different underlying models, you can leverage the best model for one persona. Like people said before (6 month ago, no clue if this is still valid) that they prefer GPT for planning and Claude for executing / coding. | ||