Remix.run Logo
fumeux_fume 5 days ago

In the article, OpenAI defines hallucinations as "plausible but false statements generated by language models." So clearly it's not all that LLMs know how to do. I don't think Parsons is working from a useful or widely agreed upon definition of what a hallucination is which leads to these "hot takes" that just clutter and muddy up the conversation around how to reduce hallucinations to produce more useful models.

mpweiher 5 days ago | parent | next [-]

They just redefined the term so that they no longer call hallucinations that are useful hallucinations.

But the people who say everything LLMs do is hallucinate clearly also make that distinction, they just refuse to rename the useful hallucinations.

"How many legs does a dog have if you call his tail a leg? Four. Saying that a tail is a leg doesn't make it a leg." -- Abraham Lincoln

johnnyanmac 5 days ago | parent [-]

I'd say a humans ability to reason with theoretical situations like this is our very core of creativity and intelligence, though. This quote makes sense for a policy maker, but not a scientist.

Now granted, we also need to back up those notions with rigorous testing and observation, but those "if a tail is a leg" theoretical is the basis of the reasoning.

mcphage 5 days ago | parent | prev [-]

LLMs don’t know the difference between true and false, or that there even is a difference between true and false, so I think it’s OpenAI whose definition is not useful. As for widely agreed upon, well, I’m assuming the purpose of this post is to try and reframe the discussion.

hodgehog11 5 days ago | parent [-]

If an LLM outputs a statement, that is by definition either true or false, then we can know whether it is true or false. Whether the LLM "knows" is irrelevant. The OpenAI definition is useful because it implies hallucination is something that can be logically avoided.

> I’m assuming the purpose of this post is to try and reframe the discussion

It's to establish a meaningful and practical definition of "hallucinate" to actually make some progress. If everything is a hallucination as the other comments seem to suggest, then the term is a tautology and is of no use to us.

kolektiv 5 days ago | parent | next [-]

It's useful as a term of understanding. It's not useful to OpenAI and their investors, so they'd like that term to mean something else. It's very generous to say that whether an LLM "knows" is irrelevant. They would like us to believe that it can be avoided, and perhaps it can, but they haven't shown they know how to do so yet. We can avoid it, but LLMs cannot, yet.

Yes, we can know whether something is true or false, but this is a system being sold as something useful. If it relies on us knowing whether the output is true or false, there is little point in us asking it a question we clearly already know the answer to.

hodgehog11 4 days ago | parent [-]

I mean no disrespect, as I'm no more fond of OpenAI than anyone else (they are still the villains in this space), but I strongly disagree.

> It's useful as a term of understanding.

No it isn't. I dare you to try publishing in this field with that definition. Claiming all outputs are hallucinations because it's a probabilistic model tells us nothing of value about what the model is actually doing. By this definition, literally everything a human says is a hallucination as well. It is only valuable to those who wish to believe that LLMs can never do anything useful, which as Hinton says, is really starting to sound like an ego-driven religion at this point. Those that follow it do not publish in top relevant outlets any more, and should not be regarded as an expert on the subject.

> they haven't shown they know how to do so yet. We can avoid it, but LLMs cannot, yet.

This is exactly what they argue in the paper. They discuss the logical means by which humans are able to bypass making false statements by saying "I don't know". A model that responds only with a lookup table and an "I don't know" can never give false statements, but is probably not so useful either. There is a sweet spot here, and humans are likely close to it.

> If it relies on us knowing whether the output is true or false

I never said the system relies on it. I said that our definition of hallucination, and therefore our metrics by which to measure it, depend only on our knowing whether the output is true. This is no different from any other benchmark. They are claiming that it might be useful to make a new benchmark for this concept.

username223 5 days ago | parent | prev [-]

"Logically avoided?"

OpenAI has a machine that emits plausible text. They're trying to argue that "emitting plausible text" is the hard problem, and "modeling the natural world, human consciousness, society, etc." is the easy one.

hodgehog11 4 days ago | parent [-]

Hmm, I don't see where they have suggested this, could you point to where this is? If they do argue for this, then I would also disagree with them.

Modelling those things is a separate problem to emitting plausible text and pursuing one is not necessarily beneficial to the other. It seems more sensible to pursue separate models for each of these tasks.