Remix.run Logo
kingstnap 5 days ago

There is this deeply wrong part of this paper that no one has mentioned:

The model head doesn't hallucinate. The sampler does.

If you ask an LLM when x was born and it doesn't know.

And you take a look at the actual model outputs which is a probability distribution over tokens.

IDK is cleanly represented as a uniform probability Jan 1 to Dec 31

If you ask it to answer a multiple choice question and it doesn't know. It will say this:

25% A, 25% B, 25% C, 25%D.

Which is exactly, and correctly, the "right answer". The model has admitted it doesn't know. It doesn't hallucinate anything.

In reality we need something smarter than a random sampler to actually extract this information out. The knowledge and lack of knowledge is there, you just produced bullshit out of it.

ACCount37 5 days ago | parent | next [-]

No, that's a misconception. It's not nearly that simple.

There are questions that have a palpable split in probability between the answers, with logit distribution immediately exposing the underlying lack-of-confidence.

But there are also questions that cause an LLM to produce consistent-but-wrong answers. For example, because the question was associated with another not-the-same-but-somewhat-similar question internally, and that was enough to give an LLM a 93% on B, despite B being the wrong answer.

An LLM might even have some latent awareness of its own uncertainty in this case. But it has, for some reason, decided to proceed with a "best guess" answer, which was in this case wrong.

a2128 3 days ago | parent | prev | next [-]

This is only true if you have a pretrained base model trained on infinite true data with no bias. In practice it will have picked up some bias, maybe it encountered more famous "James" birthdays in January and on a digit starting with 2, so Jan 2 and Jan 20-29 has a higher probability than all. But finetuning and especially RL completely breaks these probabilities as a measure of certainty because the goals shift from generally modelling text to something else entirely.

numeri 5 days ago | parent | prev | next [-]

This isn't right – calibration (informally, the degree to which certainty in the model's logits correlates with its chance of getting an answer correct) is well studied in LLMs of all sizes. LLMs are not (generally) well calibrated.

cyanydeez 5 days ago | parent | prev [-]

Im betting there's a graph model using various vectors that could improve known-knowns in outcomes.

But unknown-unknowns likely reduce to the Halting problem, which human intelligence doesnt really solve either.