| ▲ | XCSme 3 hours ago | |||||||||||||||||||||||||
But in the end, isn't this the same idea with the MoE? Where we have more specialized "jobs", which the model is actually trained for. I think the main difference with agents swarm is the ability to run them in parallel. I don't see how this adds much compared to simply sending multiple API calls in parallel with your desired tasks. I guess the only difference is that you let the AI decide how to split those requests and what each task should be. | ||||||||||||||||||||||||||
| ▲ | zozbot234 3 hours ago | parent [-] | |||||||||||||||||||||||||
Nope. MoE is strictly about model parameter sparsity. Agents are about running multiple small-scale tasks in parallel and aggregating the results for further processing - it saves a lot of context length compared to having it all in a single session, and context length has quadratic compute overhead so this matters. You can have both. One positive side effect of this is that if subagent tasks can be dispatched to cheaper and more efficient edge-inference hardware that can be deployed at scale (think nVidia Jetsons or even Apple Macs or AMD APU's) even though it might be highly limited in what can fit on the single node, then complex coding tasks ultimately become a lot cheaper per token than generic chat. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||