Remix.run Logo
pu_pe 7 hours ago

I don't get this "agent swarm" concept. You set up a task and they boot up 100 LLMs to try to do it in parallel, and then one "LLM judge" puts it all together? Is there anywhere I can read more about it?

vessenes 6 hours ago | parent | next [-]

You can read about this basically everywhere - the term of art is agent orchestration. Gas town, Claude’s secret swarm mode, or people who like to use phrases like “Wiggum loop” will get you there.

If you’re really lazy - the quick summary is that you can benefit from the sweet spot of context length and reduce instruction overload while getting some parallelism benefits from farming tasks out to LLMs with different instructions. The way this is generally implemented today is through tool calling, although Claude also has a skills interface it has been trained against.

So the idea would be for software development, why not have a project/product manager spin out tasks to a bunch of agents that are primed to be good at different things? E.g. an architect, a designer, and so on. Then you just need something that can rectify GitHub PRs and bob’s your uncle.

Gas town takes a different approach and parallelizes on coding tasks of any sort at the base layer, and uses the orchestration infrastructure to keep those coders working constantly, optimizing for minimal human input.

IanCal 5 hours ago | parent [-]

I'm not sure whether there are parts of this done for claude but those other ones are layers on top of the usual LLMs we see. This seems to be a bit different, in that there's a different model trained specifically for splitting up and managing the workload.

Rebuff5007 6 hours ago | parent | prev | next [-]

I've also been quite skeptical, and I became even more skeptical after hearing a tech talk from a startup in this space [1].

I think the best way to think about it is that its an engineering hack to deal with a shortcoming of LLMs: for complex queries LLMs are unable to directly compute a SOLUTION given a PROMPT, but are instead able to break down the prompt to intermediate solutions and eventually solve the original prompt. These "orchestrator" / "swarm" agents add some formalism to this and allow you to distribute compute, and then also use specialized models for some of the sub problems.

[1] https://www.deepflow.com/

jonkoops 7 hours ago | parent | prev | next [-]

The datacenters yearn for the chips.

rvnx 6 hours ago | parent | prev [-]

You have a team lead that establishes a list of tasks that are needed to achieve your mission

then it creates a list of employees, each of them is specialized for a task, and they work in parallel.

Essentially hiring a team of people who get specialized on one problem.

Do one thing and do it well.

XCSme 3 hours ago | parent [-]

But in the end, isn't this the same idea with the MoE?

Where we have more specialized "jobs", which the model is actually trained for.

I think the main difference with agents swarm is the ability to run them in parallel. I don't see how this adds much compared to simply sending multiple API calls in parallel with your desired tasks. I guess the only difference is that you let the AI decide how to split those requests and what each task should be.

zozbot234 3 hours ago | parent [-]

Nope. MoE is strictly about model parameter sparsity. Agents are about running multiple small-scale tasks in parallel and aggregating the results for further processing - it saves a lot of context length compared to having it all in a single session, and context length has quadratic compute overhead so this matters. You can have both.

One positive side effect of this is that if subagent tasks can be dispatched to cheaper and more efficient edge-inference hardware that can be deployed at scale (think nVidia Jetsons or even Apple Macs or AMD APU's) even though it might be highly limited in what can fit on the single node, then complex coding tasks ultimately become a lot cheaper per token than generic chat.

XCSme 3 hours ago | parent [-]

Yes, I know you can have both.

My point was that this is just a different way of creating specialised task solvers, the same as with MoE.

And, as you said, with MoE it's about the model itself, and it's done at training level so that's not something we can easily do ourselves.

But with agent swarm, isn't it simply splitting a task in multiple sub-tasks and sending each one in a different API call? So this can be done with any of the previous models too, only that the user has to manually define those tasks/contexts for each query.

Or is this at a much more granular level than this, which would not be feasible to be done by hand?

I was already doing this in n8n, creating different agents with different system prompts for different tasks. I am not sure if automating this (with swarm) would work well in my most cases, I don't see how this fully complements Tools or Skills

zozbot234 3 hours ago | parent [-]

MoE has nothing whatsoever to do with specialized task solvers. It always operates per token within a single task, you can think of it perhaps as a kind of learned "attention" for model parameters as opposed to context data.

XCSme 3 hours ago | parent [-]

Yes, specific weights/parameters have be trained to solve specific tasks (trained on different data).

Or did I misunderstand the concept of MoE, and it's not about having specific parts of the model (parameters) do better on specific input contexts?