▲ | ProofHouse 8 days ago | |||||||||||||||||||||||||||||||
Can you elaborate on how the agents degrades from more tools? By paralysis or overuse? Isn’t this both ways a function of direction on correctly instructing which to use when? Tnx | ||||||||||||||||||||||||||||||||
▲ | lelanthran 8 days ago | parent | next [-] | |||||||||||||||||||||||||||||||
The context window is limited. Using half your context window for tools means you have a 50% smaller context window. On a large and complex system (not even a mini ERP system or even a basic bookkeeping system, but a small inventory mgmt system) you are going to have a few dozen tools, each with a description of parameters and return values. For anything like an ERP system you are going to have a few thousands of tools, which probably wouldn't even fit in the context before the user supplied prompt. This is why the only use case this far for genAI is coding: with a mere 7 tools you can do everything. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | diggan 8 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
> Can you elaborate on how the agents degrades from more tools? The more context you have in the requests, the worse the performance, I think this is pretty widely established at this point. For best accuracy, you need to constantly prune the context, or just begin from the beginning. So with that, each tool you make available to the LLM for tool calling, requires you to actually put the definition (arguments, what it's used for, the name and so on) into the context. So if you have 3 tools available, which are all relevant to the current prompt, you'd get better responses, compared to if you had 100 tools available, where only 3 are relevant, and the rest of the definitions are just filling the context for little point. TLDR: context grows with each tool definition, more context == worse inference, so less tool definitions == better responses. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | 0x457 8 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Better if you see it for yourself. Setup GitHub MCP and enable all tools. It will start using wrong tools at wrong time, overuse it. Add languageserver-mcp, and it suddenly will start trying to use it for file edits and create a huge mess in files. I have nixos mcp server available to search documentation and packages, but it often starts using it for entirely different things. It's almost like when you tell someone not to think about an elephant, and they can't stop thinking - if you provide it with a tool, it will try to use it. That's why sub-agents are better because you can limit tool availability. I use tidewave mcp and as soon as it uses a single tool from it, claude becomes obsessed with it, I saw it waste entire context running evals there without doing any file edits. | ||||||||||||||||||||||||||||||||
▲ | ramoz 8 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
It’s not just context. It is similar to paralysis - in that now every prompt the model has to reason over more tools to possibly decide to use - this is surely a deviation from training the more tools you add | ||||||||||||||||||||||||||||||||
▲ | datadrivenangel 8 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||
Imagine that for every task you receive, you also received a list of all the systems and tools you had access to. So a JIRA ticket description might be several thousand lines long now when the actual task description is a few sentences. The ratio of signal to noise is now bad, and the risk of making mistakes goes up, and the models degrade. |