Remix.run Logo
catlifeonmars 5 days ago

> Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook.

I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.

ninetyninenine 5 days ago | parent [-]

>There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook.

This is true, but you could say the same thing about a human too right? There's no way to say there's a connection between what a human says and whether or not a human understands something. Right? We can't do mind reading here.

So how do we determine whether or not a human understands something? Based off of what the human tells us. So I'm just extrapolating that concept to the LLM. It knows things. Does it matter what the underlying mechanism is? If we get LLM output to be perfect in every way but the underlying mechanism is still feed forward networks with token prediction then I would still say it "understands" because that's the EXACT metric we use to determine whether a human "understands" things.

>I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.

Totally understood. And I didn't say that it knew the difference. I was saying basically a different version of what you're saying.

You say: We can't determine if it knows the difference between truth and falsehood. I say: We can't determine if it doesn't know the difference between truth and falsehood.

Neither statement contradicts each other. The parent commenter imo was making a definitive statement in that he claims we know it doesn't understand and I was just contradicting that.