▲ | diggan 8 days ago | |||||||
> Can you elaborate on how the agents degrades from more tools? The more context you have in the requests, the worse the performance, I think this is pretty widely established at this point. For best accuracy, you need to constantly prune the context, or just begin from the beginning. So with that, each tool you make available to the LLM for tool calling, requires you to actually put the definition (arguments, what it's used for, the name and so on) into the context. So if you have 3 tools available, which are all relevant to the current prompt, you'd get better responses, compared to if you had 100 tools available, where only 3 are relevant, and the rest of the definitions are just filling the context for little point. TLDR: context grows with each tool definition, more context == worse inference, so less tool definitions == better responses. | ||||||||
▲ | 112233 8 days ago | parent | next [-] | |||||||
Are there any easy to use inference frontends that support rewriting/pruning the context? Also, ideally, masking out chunks of kv-cache (e.g. old think blocks)? Because I cannot find anything short of writing custom fork/app on top of hf transformers or llama.cpp | ||||||||
| ||||||||
▲ | danielrico 8 days ago | parent | prev [-] | |||||||
I jumped off the boat of llm a little before MCP was a thing, so I thought that the tools were presented as needed by the prompt/context in a way not dissimilar of RAG. Isn't this the standard way? | ||||||||
|