But this explanation doesn’t fully characterize it does it?

Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

Additionally when the LLM responds MOST of the answers are true even though quite a bit are wrong. If it had no conceptual understanding of truth than the majority of its answers would be wrong because there are overwhelmingly far more wrong responses than there are true responses. Even a “close” hallucination has a low probability of occurring due to its proximity to a low probability region of truth in the vectorized space.

You’ve been having trouble conveying these ideas to relatives because it’s an inaccurate characterization of phenomena we don’t understand. We do not categorically fully understand what’s going on with LLMs internally and we already have tons of people similar to you making claims like this as if it’s verifiable fact.

Your claim here cannot be verified. We do not know if LLMs know the truth and they are lying to us or if they are in actuality hallucinating.

You want proof about why your statement can’t be verified? Because the article the parent commenter is responding to is saying the exact fucking opposite. OpenAI makes an opposing argument and it can go either way because we don’t have definitive proof about either way. The article is saying that LLMs are “guessing” and that it’s an incentive problem that LLMs are inadvertently incentivized to guess and if you incentivize the LLM to not confidently guess and to be more uncertain the outcomes will change to what we expect.

Right? If it’s just an incentive problem it means the LLM does know the difference between truth and uncertainty and that we can coax this knowledge out of the LLM through incentives.

▲

kolektiv 5 days ago | parent | next [-]

But an LLM is not answering "what is truth?". It's "answering" "what does an answer to the question "what is truth?" look like?".

It doesn't need a conceptual understanding of truth - yes, there are far more wrong responses than right ones, but the right ones appear more often in the training data and so the probabilities assigned to the tokens which would make up a "right" one are higher, and thus returned more often.

You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.

A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't. It looks miraculous to the relatively untrained eye - many things do, but just because I might not understand how something works, it doesn't mean nobody does.

▲

rambambram 5 days ago | parent | next [-]

Nice to read some common sense in a friendly way. I follow your RSS feed, please keep posting on your blog. Unless you're an AI and secretly obtained some form of emergent consciousness, then not.

▲

ninetyninenine 5 days ago | parent | prev [-]

>But an LLM is not answering "what is truth?". It's "answering" "what does an answer to the question "what is truth?" look like?".

You don't actually know this right? You said what I'm saying is theoretically possible so you're contradicting what you're saying.

>You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.

Where did I say it's conscious? You hallucinated here thinking I said something I didn't.

Just because you can lie doesn't mean you're conscious. For example, a sign can lie to you. If the speed limit is 60 but there's a sign that says the speed limit is 100 then the sign is lying. Is the sign conscious? No.

Knowing is a different story though. But think about this carefully. How would we determine whether a "human" knows anything? We only can tell whether a "human" "knows" things based on what it Tells us. Just like an LLM. So based off of what the LLM tells us, it's MORE probable that the LLM "knows" because that's the SAME exact reasoning on how we can tell a human "knows". There's no other way we can determine whether or not an LLM or a human "knows" anything.

So really I'm not anthropomorphizing anything. You're the one that's falling for that trap. Knowing and lying are not unique concepts to conciousness or humanity. These are neutral concepts that exist beyond what it means to be human. When I say something, "knows" or something "lies" I'm saying it from a highly unbiased and netural perspective. It is your bias that causes you to anthropomorphize these concepts with the hallucination that these are human centric concepts.

>A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't.

Bro. You're out of touch.

https://www.youtube.com/watch?v=qrvK_KuIeJk&t=284s

Hinton, the godfather of modern AI says we don't understand. It's not people saying we don't understand. It's the generally understanding within academia is: we don't understand LLMs. So you're wrong. You don't know what you're talking about and you're highly misinformed.

▲

zbentley 5 days ago | parent [-]

I think your assessment of the academic take on AI is wrong. We have a rather thorough understanding of the how/why of the mechanisms of LLMs, even if after training their results sometimes surprise us.

Additionally, there is a very large body of academic research that digs into how LLMs seem to understand concepts and truths and, sure enough, examples of us making point edits to models to change the “facts” that they “know”. My favorite of that corpus, though far from the only or most current/advances research , is the Bau Lab’s work: https://rome.baulab.info/

▲

ninetyninenine 5 days ago | parent | next [-]

It’s not about what you think it’s about who’s factually right or wrong.

You referenced a work on model interpretability which is essentially the equivalent of putting on MRI or electrodes on the human brain and saying we understand the brain because some portion of it lights up when we show the brain a picture of a cow. There’s lots of work on model interpretability just like how there’s lots of science involving brain scans of the human brain… the problem is none of this gives insight into how the brain or an LLM works.

In terms of understanding LLMs we overall don’t understand what’s going on. It’s not like I didn’t know about attempts to decode what’s going on in these neural networks… I know all about it, but none of it changes the overall sentiment of: we don’t know how LLMs work.

This is fundamentally different from computers. We know how computers work such that we can emulate a computer. But for an LLM we can’t fully control it, we don’t fully understand why it hallucinates, we don’t understand how to fix the hallucination and we definitely cannot emulate an LLM in the same way we do for a computer. It isn’t just that we don’t understand LLMs. It’s that there isn’t anything in the history of human invention that we lack such fundamental understanding of.

Off of that logic, the facts are unequivocally clear: we don’t understand LLMs and your statement is wrong.

But it goes beyond this. I’m not just saying this. This is the accepted general sentiment in academia and you can watch that video of Hinton, the godfather of AI in academia basically saying the exact opposite of your claim here. He literally says we don’t understand LLMs.

	▲	cindyllm 5 days ago \| parent [-]
		[dead]

▲

riwsky 4 days ago | parent | prev [-]

Here’s where you're clearly wrong. The correct favorite in that corpus is Golden Gate Claude: https://www.anthropic.com/news/golden-gate-claude

	▲	zbentley 3 days ago \| parent [-]
		Both are very good! I usually default to sharing the Bau Lab's work on this subject rather than Anthropic's because a) it's a little less fraught when sharing with folks who are skeptical of commercial AI companies, and b) because Bau's linked research/notebooks/demos/graphics are a lot more accessible to different points on the spectrum between "machine learning academic researcher" and "casual reader"; "Scaling/Towards Monosemanticity" are both massive and, depending on the section, written for pretty extreme ends of the layperson/researcher spectrum. The Anthropic papers also cover a lot more subjects (e.g. feature splitting, discussion on use in model moderation, activation penalties) than Bau Lab's, as well--which is great, but maybe not when shared as a targeted intro to interpretability/model editing.

▲

Jensson 5 days ago | parent | prev | next [-]

> Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

This isn't how LLM works. What an LLM understands has nothing to do with the words they say, it only has to do with what connections they have seen.

If an LLM has only seen a manual but has never seen examples of how the product is used, then it can tell you exactly how to use the product by writing out info from the manual, but if you ask it to do those things then it wont be able to, since it has no examples to go by.

This is the primary misconception most people have and make them over estimate what their LLM can do, no they don't learn by reading instructions they only learn by seeing examples and then doing the same thing. So an LLM talking about truth just comes from it having seen others talk about truth, not from it thinking about truth on its own. This is fundamentally different to how humans think about words.

	▲	ninetyninenine 5 days ago \| parent [-]
		>This isn't how LLM works. I know how an LLM works. I've built one. At best we only know surface level stuff like the fact that it involves a feed forward network and is using token prediction. But the emergent effect of how it an LLM produces an overall statement that reflects high level conceptual understanding is something we don't know. So your claim of "This isn't how an LLM works" which was said which such confidence is utterly wrong. You don't know how it works, no one does.

▲

catlifeonmars 5 days ago | parent | prev [-]

> Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook.

I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.

	▲	ninetyninenine 5 days ago \| parent [-]
		>There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook. This is true, but you could say the same thing about a human too right? There's no way to say there's a connection between what a human says and whether or not a human understands something. Right? We can't do mind reading here. So how do we determine whether or not a human understands something? Based off of what the human tells us. So I'm just extrapolating that concept to the LLM. It knows things. Does it matter what the underlying mechanism is? If we get LLM output to be perfect in every way but the underlying mechanism is still feed forward networks with token prediction then I would still say it "understands" because that's the EXACT metric we use to determine whether a human "understands" things. >I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM. Totally understood. And I didn't say that it knew the difference. I was saying basically a different version of what you're saying. You say: We can't determine if it knows the difference between truth and falsehood. I say: We can't determine if it doesn't know the difference between truth and falsehood. Neither statement contradicts each other. The parent commenter imo was making a definitive statement in that he claims we know it doesn't understand and I was just contradicting that.