> Can you elaborate on how the agents degrades from more tools?

The more context you have in the requests, the worse the performance, I think this is pretty widely established at this point. For best accuracy, you need to constantly prune the context, or just begin from the beginning.

So with that, each tool you make available to the LLM for tool calling, requires you to actually put the definition (arguments, what it's used for, the name and so on) into the context.

So if you have 3 tools available, which are all relevant to the current prompt, you'd get better responses, compared to if you had 100 tools available, where only 3 are relevant, and the rest of the definitions are just filling the context for little point.

TLDR: context grows with each tool definition, more context == worse inference, so less tool definitions == better responses.

▲

112233 8 days ago | parent | next [-]

Are there any easy to use inference frontends that support rewriting/pruning the context? Also, ideally, masking out chunks of kv-cache (e.g. old think blocks)?

Because I cannot find anything short of writing custom fork/app on top of hf transformers or llama.cpp

	▲	diggan 8 days ago \| parent [-]
		I tend to use my own "prompt management CLI" (https://github.com/victorb/prompta) to setup somewhat reusable prompts, then paste the output into whatever UI/CLI I use at the moment. Then rewriting/pruning is a matter of changing the files on disk, rerun "prompta output", create a new conversion. I basically never go beyond one user message and one assistant message, seems to degrade really quickly otherwise.

▲

danielrico 8 days ago | parent | prev [-]

I jumped off the boat of llm a little before MCP was a thing, so I thought that the tools were presented as needed by the prompt/context in a way not dissimilar of RAG. Isn't this the standard way?

	▲	jacobr1 8 days ago \| parent [-]
		You _can_ build things that way. But then you need some business logic to decide which tools to expose to the system. The easy/dumb way is just to give it all the tools. With RAG, you have retrieval step where you have hardcoded some kind of search (likely semantic) and some kind of pruning or relevance logic (maybe give the top 5 results that have at least X% relevancy matching). With tools there is no equivalent. Maybe you could try some semantic similarity to the tool description, but I don't know of any system that does that. What seems to be happening is building distinct "agents" that have a set of tools designed into them. An Agent is a system prompt+tools, where some of tools might be the ability to call/handoff to other agents. Each call to an agent is a new context, albeit with some limited context handed in from the caller agent. That way you are manually decomposing the project into a distinct set of sub-agents that can be concretely reasoned about and can perform a small set of related tasks. Then you need some kind of overall orchestration agent that can handle dispatch to other agents.