Hallucination is all an LLM does. That is their nature, to hallucinate.

We just happen to find some of these hallucinations useful.

Let's not pretend that hallucination is a byproduct. The usefulness is the byproduct. That is what surprised the original researchers on transformer performance, and that is why the 'attention is all you need' paper remains such a phenomenon.

▲

fumeux_fume 5 days ago | parent | next [-]

> Hallucination is all an LLM does.

I wish people who take this stance would seriously reconsider their take on how hallucinations are defined and how unhelpful it is to conflate hallucination with generation from a probability distribution. I appreciate OpenAI publishing articles like this because, while the parent comment and I may have to agree to disagree on how hallucinations are defined, I can at least appeal to OpenAI's authority to say that such arguments are not only unhelpful, but also unsound.

	▲	Zigurd 5 days ago \| parent [-]
		You're going to get a lot of pushback on the idea of taking the definition of hallucination seriously. Calling fluently stated bunk "hallucination" feels cynical to begin with. Trying to weave a silk purse out of that sow's ear is difficult.

▲

hodgehog11 5 days ago | parent | prev [-]

I don't know what you mean by hallucination here; are you saying that any statistical output is "hallucination"? If so, then we are also constantly hallucinating I guess.

There doesn't seem to be a particularly consistent definition of what "hallucinate" means in the context of LLMs, so let's make one that is in line with the post.

"Hallucination" is when a language model outputs a sequence of tokens comprising a statement (an assertion that is either true or false) that is incorrect. Under this definition, hallucination is clearly not all that an LLM can do.

An easy way to avoid hallucination under this definition is to respond with something that is never a statement when there is a possibility that it can be incorrect; e.g. "I think that... I don't know...". To me, this seems to be what the authors argue. This has always seemed pretty obvious to most people I've spoken to (hell, I've reviewed grant applications from years ago which talk about this), so I'm not sure why it took so long for the "frontier" developers to actually try this.