▲ | libraryofbabel 4 days ago | |
You have the right picture of what’s going on. Roughly: * The only true interface with an LLM is tokens. (No separation between control and data channels.) * The model api layer injects instructions on tool calling and a list of available tools into the base prompt, with documentation on what those tools do. * Tool calling is delineated by special tokens. When a model wants to call a tool, it adds a special block to the response that contains the magic token(s) along with the name of the tool and any params. The api layer then extracts this and forms a structured json response in some tool_calls parameter or whatever that is sent in the api response to the user. The result of the tool coming back from the user through the tool calling api is then encoded with special tokens and injected. * Presumably, the api layer prevents the user from injecting such tokens themselves. * SotA Models are good at tool calls because they have been heavily fine-tuned on them, with all sorts of tasks that involve tool calls, like bash invocations. The fine-tuning is both to get them good at tool calls in general, and also probably involves specific tool calls that the model provider wants them to be good at, such as the Claude Sonnet model getting fine-tuned on the specific tools Claude Code uses. Sometimes it amazes me that this all works so well, but it does. You are right to put your finger on the fine-tuning, as it’s critical for making tool calling work well. Tool calling works without fine-tuning, but it’s going to be more hit-or-miss. |