▲ | nlawalker 8 days ago | |
> writing bash commands into my terminal This is what the author means by "knowing how to use the tool". The LLM alone is effectively a function that outputs text, it has no other capabilities, it cannot "connect to" or "use" anything by itself. The closest it can come is outputting an unambiguous, structured text request that can be interpreted by the application code that wraps it and does something on its behalf. The author's point hinges on the architectural distinction between the LLM itself and that application code, which is increasingly irrelevant and invisible to most people (even developers) because the application code that knows how to do things like call MCP servers is already baked in to most LLM-driven products and services. No one is "talking directly to" an LLM, it's all mediated by multiple layers, including layers that perform tool calling. | ||
▲ | creddit 8 days ago | parent [-] | |
I understood the gist of what the author is trying to say and ultimately this all comes down to a matter of philosophy. My post is mostly tongue in cheek and poking lightheartedly at the moving goal posts of what "LLMs know how to do". The only fundamental part of what they said that I would say is unambiguously false is the first sentence: the LLM (already itself hard to define!) fundamentally does know how to use tools through its expected interface. That that interface may not be connected to something isn't really a fault of the LLM's nor is it a demonstration of the knowledge and understanding the LLM has. An analogy would be "humans don't have native tool calling abilities, all they can do is press physical keys that represent a function call". I too don't have the ability to natively control a computer in the same sense that the LLM doesn't. If the keyboard to a computer is disconnected then I too will just emit keypresses into the void much like an LLM will emit tool call tokens into a void where they are not linked to an MCP like interface. |