Remix.run Logo
Ukv 2 days ago

Agree that's a better intuition, with pretraining pushing the model towards saying "I don't know" in the kinds of situations where people write that as opposed to by introspection of its own confidence.

ACCount37 2 days ago | parent [-]

There appears to be a degree of "introspection of its own confidence" in modern LLMs. They can identify their own hallucinations, at a rate significantly better than chance. So there must be some sort of "do I recall this?" mechanism built into them. Even if it's not exactly a reliable mechanism.

Anthropic has discovered that this is definitely the case for name recognition, and I suspect that names aren't the only things subject to a process like that.