Remix.run Logo
creddit 8 days ago

> But here’s the important part: LLMs don’t know how to use tools. They don’t have native tool calling support. They just generate text that represents a function call.

This terrifies me. This whole time I was writing bash commands into my terminal, I thought I knew how to use the tools. Now, I’ve just learned that I had no idea how to use tools at all! I just knew how to write text that /represented/ tool use.

nlawalker 8 days ago | parent | next [-]

> writing bash commands into my terminal

This is what the author means by "knowing how to use the tool". The LLM alone is effectively a function that outputs text, it has no other capabilities, it cannot "connect to" or "use" anything by itself. The closest it can come is outputting an unambiguous, structured text request that can be interpreted by the application code that wraps it and does something on its behalf.

The author's point hinges on the architectural distinction between the LLM itself and that application code, which is increasingly irrelevant and invisible to most people (even developers) because the application code that knows how to do things like call MCP servers is already baked in to most LLM-driven products and services. No one is "talking directly to" an LLM, it's all mediated by multiple layers, including layers that perform tool calling.

creddit 8 days ago | parent [-]

I understood the gist of what the author is trying to say and ultimately this all comes down to a matter of philosophy. My post is mostly tongue in cheek and poking lightheartedly at the moving goal posts of what "LLMs know how to do". The only fundamental part of what they said that I would say is unambiguously false is the first sentence: the LLM (already itself hard to define!) fundamentally does know how to use tools through its expected interface. That that interface may not be connected to something isn't really a fault of the LLM's nor is it a demonstration of the knowledge and understanding the LLM has.

An analogy would be "humans don't have native tool calling abilities, all they can do is press physical keys that represent a function call". I too don't have the ability to natively control a computer in the same sense that the LLM doesn't. If the keyboard to a computer is disconnected then I too will just emit keypresses into the void much like an LLM will emit tool call tokens into a void where they are not linked to an MCP like interface.

jerf 8 days ago | parent | prev | next [-]

A lot of people resist the idea that programming is intrinsically mathematical, but this is one of the places it pops out. The power of programming lies precisely in the way it brings together text that "represents" something with text that "does" something. That is, at the core, the source of its power. You can still draw the distinction philosophically, as you just did, but at the same time there is also a profound way in which there is in fact no difference between "using" computers and "representing" your use of computers.

fennecfoxy 8 days ago | parent | prev | next [-]

I think what your quote is trying to say essentially boils down to: LLMs can be given facts in the context, we _hope_ that the statistical model picks up on that information/tool calls but it isn't _guaranteed_.

Unlike human beings such as yourself (presumably), LLMs do not have agency, they do not have conscious or active thought. All they do is predict the next token.

I've thought about the above a lot, these models are certainly capable of a lot, but they do not in any form or fashion emulate the consciousness that we have. Not yet.

johnmaguire 8 days ago | parent | prev [-]

I think you might be missing the point of this quote, which is that you don't have to introduce additional code into the model to support MCP.

MCP happens at a different layer. You have to run the MCP commands. Or use a client that does it for you:

> But the LLM will never know you are using MCP, unless you are letting it know in the system prompt of tool definitions. You, the developer, is responsible for calling the tools. The LLM only generates a snippet of what tool(s) to call with which input parameters.

The article is describing how MCP works, not making an argument about what it means to "understand" something.

creddit 8 days ago | parent [-]

Yeah I think you're right. This is a more likely interpretation: the writer probably isn't actually making a claim regarding understanding.