Indeed - as Rebecca Parsons puts it, all an LLM knows how to do is hallucinate. Users just tend to find some of these hallucinations useful, and some not.

▲

saghm 5 days ago | parent | next [-]

This is a a super helpful way of putting it. I've tried to explain to my less technical friends and relatives that from the standpoint of an LLM, there's no concept of "truth", and that all it basically just comes up with the shape of what a response should look like and then fills in the blanks with pretty much anything it wants. My success in getting the point across has been mixed, so I'll need to try out this much more concise way of putting it next time!

▲

ninetyninenine 5 days ago | parent [-]

But this explanation doesn’t fully characterize it does it?

Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

Additionally when the LLM responds MOST of the answers are true even though quite a bit are wrong. If it had no conceptual understanding of truth than the majority of its answers would be wrong because there are overwhelmingly far more wrong responses than there are true responses. Even a “close” hallucination has a low probability of occurring due to its proximity to a low probability region of truth in the vectorized space.

You’ve been having trouble conveying these ideas to relatives because it’s an inaccurate characterization of phenomena we don’t understand. We do not categorically fully understand what’s going on with LLMs internally and we already have tons of people similar to you making claims like this as if it’s verifiable fact.

Your claim here cannot be verified. We do not know if LLMs know the truth and they are lying to us or if they are in actuality hallucinating.

You want proof about why your statement can’t be verified? Because the article the parent commenter is responding to is saying the exact fucking opposite. OpenAI makes an opposing argument and it can go either way because we don’t have definitive proof about either way. The article is saying that LLMs are “guessing” and that it’s an incentive problem that LLMs are inadvertently incentivized to guess and if you incentivize the LLM to not confidently guess and to be more uncertain the outcomes will change to what we expect.

Right? If it’s just an incentive problem it means the LLM does know the difference between truth and uncertainty and that we can coax this knowledge out of the LLM through incentives.

▲

kolektiv 5 days ago | parent | next [-]

But an LLM is not answering "what is truth?". It's "answering" "what does an answer to the question "what is truth?" look like?".

It doesn't need a conceptual understanding of truth - yes, there are far more wrong responses than right ones, but the right ones appear more often in the training data and so the probabilities assigned to the tokens which would make up a "right" one are higher, and thus returned more often.

You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.

A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't. It looks miraculous to the relatively untrained eye - many things do, but just because I might not understand how something works, it doesn't mean nobody does.

▲

rambambram 5 days ago | parent | next [-]

Nice to read some common sense in a friendly way. I follow your RSS feed, please keep posting on your blog. Unless you're an AI and secretly obtained some form of emergent consciousness, then not.

▲

ninetyninenine 5 days ago | parent | prev [-]

>But an LLM is not answering "what is truth?". It's "answering" "what does an answer to the question "what is truth?" look like?".

You don't actually know this right? You said what I'm saying is theoretically possible so you're contradicting what you're saying.

>You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.

Where did I say it's conscious? You hallucinated here thinking I said something I didn't.

Just because you can lie doesn't mean you're conscious. For example, a sign can lie to you. If the speed limit is 60 but there's a sign that says the speed limit is 100 then the sign is lying. Is the sign conscious? No.

Knowing is a different story though. But think about this carefully. How would we determine whether a "human" knows anything? We only can tell whether a "human" "knows" things based on what it Tells us. Just like an LLM. So based off of what the LLM tells us, it's MORE probable that the LLM "knows" because that's the SAME exact reasoning on how we can tell a human "knows". There's no other way we can determine whether or not an LLM or a human "knows" anything.

So really I'm not anthropomorphizing anything. You're the one that's falling for that trap. Knowing and lying are not unique concepts to conciousness or humanity. These are neutral concepts that exist beyond what it means to be human. When I say something, "knows" or something "lies" I'm saying it from a highly unbiased and netural perspective. It is your bias that causes you to anthropomorphize these concepts with the hallucination that these are human centric concepts.

>A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't.

Bro. You're out of touch.

https://www.youtube.com/watch?v=qrvK_KuIeJk&t=284s

Hinton, the godfather of modern AI says we don't understand. It's not people saying we don't understand. It's the generally understanding within academia is: we don't understand LLMs. So you're wrong. You don't know what you're talking about and you're highly misinformed.

▲

zbentley 5 days ago | parent [-]

I think your assessment of the academic take on AI is wrong. We have a rather thorough understanding of the how/why of the mechanisms of LLMs, even if after training their results sometimes surprise us.

Additionally, there is a very large body of academic research that digs into how LLMs seem to understand concepts and truths and, sure enough, examples of us making point edits to models to change the “facts” that they “know”. My favorite of that corpus, though far from the only or most current/advances research , is the Bau Lab’s work: https://rome.baulab.info/

▲

ninetyninenine 5 days ago | parent | next [-]

It’s not about what you think it’s about who’s factually right or wrong.

You referenced a work on model interpretability which is essentially the equivalent of putting on MRI or electrodes on the human brain and saying we understand the brain because some portion of it lights up when we show the brain a picture of a cow. There’s lots of work on model interpretability just like how there’s lots of science involving brain scans of the human brain… the problem is none of this gives insight into how the brain or an LLM works.

In terms of understanding LLMs we overall don’t understand what’s going on. It’s not like I didn’t know about attempts to decode what’s going on in these neural networks… I know all about it, but none of it changes the overall sentiment of: we don’t know how LLMs work.

This is fundamentally different from computers. We know how computers work such that we can emulate a computer. But for an LLM we can’t fully control it, we don’t fully understand why it hallucinates, we don’t understand how to fix the hallucination and we definitely cannot emulate an LLM in the same way we do for a computer. It isn’t just that we don’t understand LLMs. It’s that there isn’t anything in the history of human invention that we lack such fundamental understanding of.

Off of that logic, the facts are unequivocally clear: we don’t understand LLMs and your statement is wrong.

But it goes beyond this. I’m not just saying this. This is the accepted general sentiment in academia and you can watch that video of Hinton, the godfather of AI in academia basically saying the exact opposite of your claim here. He literally says we don’t understand LLMs.

	▲	cindyllm 5 days ago \| parent [-]
		[dead]

▲

riwsky 4 days ago | parent | prev [-]

Here’s where you're clearly wrong. The correct favorite in that corpus is Golden Gate Claude: https://www.anthropic.com/news/golden-gate-claude

	▲	zbentley 3 days ago \| parent [-]
		Both are very good! I usually default to sharing the Bau Lab's work on this subject rather than Anthropic's because a) it's a little less fraught when sharing with folks who are skeptical of commercial AI companies, and b) because Bau's linked research/notebooks/demos/graphics are a lot more accessible to different points on the spectrum between "machine learning academic researcher" and "casual reader"; "Scaling/Towards Monosemanticity" are both massive and, depending on the section, written for pretty extreme ends of the layperson/researcher spectrum. The Anthropic papers also cover a lot more subjects (e.g. feature splitting, discussion on use in model moderation, activation penalties) than Bau Lab's, as well--which is great, but maybe not when shared as a targeted intro to interpretability/model editing.

▲

Jensson 5 days ago | parent | prev | next [-]

> Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

This isn't how LLM works. What an LLM understands has nothing to do with the words they say, it only has to do with what connections they have seen.

If an LLM has only seen a manual but has never seen examples of how the product is used, then it can tell you exactly how to use the product by writing out info from the manual, but if you ask it to do those things then it wont be able to, since it has no examples to go by.

This is the primary misconception most people have and make them over estimate what their LLM can do, no they don't learn by reading instructions they only learn by seeing examples and then doing the same thing. So an LLM talking about truth just comes from it having seen others talk about truth, not from it thinking about truth on its own. This is fundamentally different to how humans think about words.

	▲	ninetyninenine 5 days ago \| parent [-]
		>This isn't how LLM works. I know how an LLM works. I've built one. At best we only know surface level stuff like the fact that it involves a feed forward network and is using token prediction. But the emergent effect of how it an LLM produces an overall statement that reflects high level conceptual understanding is something we don't know. So your claim of "This isn't how an LLM works" which was said which such confidence is utterly wrong. You don't know how it works, no one does.

▲

catlifeonmars 5 days ago | parent | prev [-]

> Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook.

I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.

	▲	ninetyninenine 5 days ago \| parent [-]
		>There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook. This is true, but you could say the same thing about a human too right? There's no way to say there's a connection between what a human says and whether or not a human understands something. Right? We can't do mind reading here. So how do we determine whether or not a human understands something? Based off of what the human tells us. So I'm just extrapolating that concept to the LLM. It knows things. Does it matter what the underlying mechanism is? If we get LLM output to be perfect in every way but the underlying mechanism is still feed forward networks with token prediction then I would still say it "understands" because that's the EXACT metric we use to determine whether a human "understands" things. >I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM. Totally understood. And I didn't say that it knew the difference. I was saying basically a different version of what you're saying. You say: We can't determine if it knows the difference between truth and falsehood. I say: We can't determine if it doesn't know the difference between truth and falsehood. Neither statement contradicts each other. The parent commenter imo was making a definitive statement in that he claims we know it doesn't understand and I was just contradicting that.

▲

Zigurd 5 days ago | parent | prev | next [-]

I recently asked Gemini to riff on the concept of "Sustainable Abundance" and come up with similar plausible bullshit. I could've filled a slate of TED talks with the brilliant and plausible sounding nonsense it came up with. Liberated from the chains of correctness, LLMs' power is unleashed. For example:

The Symbiocene Horizon: A term suggesting a techno-utopian future state where humanity and technology have merged with ecological systems to achieve a perfect, self-correcting state of equilibrium.

	▲	01HNNWZ0MV43FF 5 days ago \| parent [-]
		Sounds like solarpunk

▲

fumeux_fume 5 days ago | parent | prev | next [-]

In the article, OpenAI defines hallucinations as "plausible but false statements generated by language models." So clearly it's not all that LLMs know how to do. I don't think Parsons is working from a useful or widely agreed upon definition of what a hallucination is which leads to these "hot takes" that just clutter and muddy up the conversation around how to reduce hallucinations to produce more useful models.

▲

mpweiher 5 days ago | parent | next [-]

They just redefined the term so that they no longer call hallucinations that are useful hallucinations.

But the people who say everything LLMs do is hallucinate clearly also make that distinction, they just refuse to rename the useful hallucinations.

"How many legs does a dog have if you call his tail a leg? Four. Saying that a tail is a leg doesn't make it a leg." -- Abraham Lincoln

	▲	johnnyanmac 5 days ago \| parent [-]
		I'd say a humans ability to reason with theoretical situations like this is our very core of creativity and intelligence, though. This quote makes sense for a policy maker, but not a scientist. Now granted, we also need to back up those notions with rigorous testing and observation, but those "if a tail is a leg" theoretical is the basis of the reasoning.

▲

mcphage 5 days ago | parent | prev [-]

LLMs don’t know the difference between true and false, or that there even is a difference between true and false, so I think it’s OpenAI whose definition is not useful. As for widely agreed upon, well, I’m assuming the purpose of this post is to try and reframe the discussion.

▲

hodgehog11 5 days ago | parent [-]

If an LLM outputs a statement, that is by definition either true or false, then we can know whether it is true or false. Whether the LLM "knows" is irrelevant. The OpenAI definition is useful because it implies hallucination is something that can be logically avoided.

> I’m assuming the purpose of this post is to try and reframe the discussion

It's to establish a meaningful and practical definition of "hallucinate" to actually make some progress. If everything is a hallucination as the other comments seem to suggest, then the term is a tautology and is of no use to us.

▲

kolektiv 5 days ago | parent | next [-]

It's useful as a term of understanding. It's not useful to OpenAI and their investors, so they'd like that term to mean something else. It's very generous to say that whether an LLM "knows" is irrelevant. They would like us to believe that it can be avoided, and perhaps it can, but they haven't shown they know how to do so yet. We can avoid it, but LLMs cannot, yet.

Yes, we can know whether something is true or false, but this is a system being sold as something useful. If it relies on us knowing whether the output is true or false, there is little point in us asking it a question we clearly already know the answer to.

	▲	hodgehog11 4 days ago \| parent [-]
		I mean no disrespect, as I'm no more fond of OpenAI than anyone else (they are still the villains in this space), but I strongly disagree. > It's useful as a term of understanding. No it isn't. I dare you to try publishing in this field with that definition. Claiming all outputs are hallucinations because it's a probabilistic model tells us nothing of value about what the model is actually doing. By this definition, literally everything a human says is a hallucination as well. It is only valuable to those who wish to believe that LLMs can never do anything useful, which as Hinton says, is really starting to sound like an ego-driven religion at this point. Those that follow it do not publish in top relevant outlets any more, and should not be regarded as an expert on the subject. > they haven't shown they know how to do so yet. We can avoid it, but LLMs cannot, yet. This is exactly what they argue in the paper. They discuss the logical means by which humans are able to bypass making false statements by saying "I don't know". A model that responds only with a lookup table and an "I don't know" can never give false statements, but is probably not so useful either. There is a sweet spot here, and humans are likely close to it. > If it relies on us knowing whether the output is true or false I never said the system relies on it. I said that our definition of hallucination, and therefore our metrics by which to measure it, depend only on our knowing whether the output is true. This is no different from any other benchmark. They are claiming that it might be useful to make a new benchmark for this concept.

▲

username223 5 days ago | parent | prev [-]

"Logically avoided?"

OpenAI has a machine that emits plausible text. They're trying to argue that "emitting plausible text" is the hard problem, and "modeling the natural world, human consciousness, society, etc." is the easy one.

	▲	hodgehog11 4 days ago \| parent [-]
		Hmm, I don't see where they have suggested this, could you point to where this is? If they do argue for this, then I would also disagree with them. Modelling those things is a separate problem to emitting plausible text and pursuing one is not necessarily beneficial to the other. It seems more sensible to pursue separate models for each of these tasks.

▲

throwawaymaths 5 days ago | parent | prev | next [-]

that's wrong. there is probably a categorical difference between making something up due to some sort of inferential induction from the kv cache context under the pressure of producing a token -- any token -- and actually looking something up and producing a token.

so if you ask, "what is the capital of colorado" and it answers "denver" calling it a Hallucination is nihilistic nonsense that paves over actually stopping to try and understand important dynamics happening in the llm matrices

▲

saghm 5 days ago | parent | next [-]

> so if you ask, "what is the capital of colorado" and it answers "denver" calling it a Hallucination is nihilistic nonsense that paves over actually stopping to try and understand important dynamics happening in the llm matrices

On the other hand, calling it anything other than a hallucination misrepresents the idea of truth as being something that these models have any ability to differentiate between their outputs based on whether they accurately reflect reality by conflating a fundamentally unsolved problem as an engineering tradeoff.

▲

ComplexSystems 5 days ago | parent [-]

It isn't a hallucination because that isn't how the term is defined. The term "hallucination" refers, very specifically, to "plausible but false statements generated by language models."

At the end of the day, the goal is to train models that are able to differentiate between true and false statements, at least to a much better degree than they can now, and the linked article seems to have some very interesting suggestions about how to get them to do that.

	▲	player1234 3 days ago \| parent \| next [-]
		Why use a word that you have to redefine the meaning of? The answer is to deceive.
	▲	throwawaymaths 5 days ago \| parent \| prev [-]
		your point is good and taken but i would amend slightly -- i dont think that "absolute truth" is itself a goal, but rather "how aware is it that it doesn't know something". this negative space is frustratingly hard to capture in the llm architecture (though almost certainly there are signs -- if you had direct access to the logits array, for example)

▲

mannykannot 5 days ago | parent | prev | next [-]

There is a way to state Parson's point which avoids this issue: hallucinations are just as much a consequence of the LLM working as designed as are correct statements.

▲

throwawaymaths 5 days ago | parent [-]

fine. which part is the problem?

▲

mannykannot 4 days ago | parent | next [-]

I suppose you are aware that, for many uses of LLMs, the propensity for hallucinating is a problem (especially if this is not properly taken into account by the people hoping to use these LLMs), but this then leaves me puzzled about what you are asking here.

▲

johnnyanmac 5 days ago | parent | prev [-]

The part where it can't admit situations where there's not enough data/training to admit it doesn't know.

I'm a bit surprised no one talks about this factor. It's like talking to a giant narcissist who can Google really fast but not understand what it reads. The ability to admit ignorance is a major factor of credibility, because none of us know everything all at once.

	▲	throwawaymaths 5 days ago \| parent [-]
		yeah sorry i mean which part of the architecture. "working as designed"

▲

littlestymaar 5 days ago | parent | prev [-]

> that's wrong.

Why would anyone respond with so little nuance?

> a Hallucination

Oh, so your shift key wasn't broken all the time, then why aren't you using it in your sentences?

▲

leptons 5 days ago | parent | prev [-]

"A broken clock is right twice a day"

	▲	cwmoore 5 days ago \| parent [-]
		A stopped clock. There are many other ways to be wrong than right.