▲ | barbazoo 7 days ago | |
> When prompting DeepSeek-v3, the team found that selecting the the right tools becomes critical when you have more than 30 tools. Above 30, the descriptions of the tools begin to overlap, creating confusion. Beyond 100 tools, the model was virtually guaranteed to fail their test. Using RAG techniques to select less than 30 tools yielded dramatically shorter prompts and resulted in as much as 3x better tool selection accuracy. > For smaller models, the problems begin long before we hit 30 tools. One paper we touched on last post, “Less is More,” demonstrated that Llama 3.1 8b fails a benchmark when given 46 tools, but succeeds when given only 19 tools. The issue is context confusion, not context window limitaions. High number of tools is a bit of a "smell" to me and often makes me wonder if the agent doesn't have too much responsibility. A bit like a method with so many parameters, it can do almost anything. Have folks had success with agents like that? I found the fewer tools the better, e.g. <10 "ballpark". | ||
▲ | knewter 7 days ago | parent [-] | |
we have success with 39 but we're introducing more focused agents and a smart router because we see the writing on the wall among other things (benefits) |