▲ | dragonwriter 8 days ago | |
> These models are usually very decent at parsing out stuff like that anyway; we don't need the MCP spec, everyone can just specify the available tools in natural language and then we can expect large param models to just "figure it out". This is mostly the kind of misunderstanding of MCP that the article seems directed at, and much of this response is focussed on things that are key points in the article, but: MCP isn't for the models, it is for the toolchains supporting them. The information models actually need about tools and resources is accessed from the server by the toolchain using the information that is in the MCP, and the structure that models use varies by the model, but it is consistently completely different information than what is in the MCP—the tool and resource (but probably not prompt) names from the MCP will probably also be given to the model, but that's pretty much the only direct overlap. MCP can also define prompts for the toolchain, but information about those are more likely presented directly to the user than the model itself. The toolchain also needs to know how the model is trained to get tool information in its prompt, just like it needs to know other aspects of the models preeferred prompt template, but that is a separate concern from MCP. > If MCP had been a specification for _training_ models to support tool use on an architectural level, not just training it to ask to use a tool with a special token as they do now. MCP isn't a specification for training anything. MCP is a specification for providing information about tools external to the toolchain running the LLM to the toolchain. Tools internal to the toolchain don't ever use MCP because, again, MCP isn't for the model, it's for the toolchain. | ||
▲ | fennecfoxy 5 days ago | parent [-] | |
You've replied multiple times specifying toolchains without explaining what they are. I've seen for models that don't support tool defs via API that those tool defs are provided in the context (though the model is still trained for tool use, outputting the special python_call/x tokens to indicate a tool call in output). I can see for example that MCP's own example using Anthropic uses their API/SDKs tools section as outlined here https://docs.anthropic.com/en/api/messages#body-tools. What the example does is shove the tool definition into here - this includes the full name description etc of the tool. Quoting them "And then asked the model "What's the S&P 500 at today?", the model might produce tool_use content blocks in the response" so I imagine that behind the scenes they're _smashing it into the context_ as I already suggested; the only reason it's separate in the API is so they can type/validate it. I don't know what this magical tool chain is but the LLM is the thing providing output based on the not so new magical concept of attention and statistics; I don't see how some separate "toolchain" piece takes the input string and somehow does a better job at selecting a tool than the model itself; unless the toolchain is itself a smaller LLM trained specifically for tool use outside of your larger multi-purpose/"knowledgable" LLM. |