Remix clone Hacker News

new | show | ask | jobs Github

	▲	Ukv 2 days ago
		Agree that's a better intuition, with pretraining pushing the model towards saying "I don't know" in the kinds of situations where people write that as opposed to by introspection of its own confidence.
	▲	ACCount37 2 days ago \| parent [-]
		There appears to be a degree of "introspection of its own confidence" in modern LLMs. They can identify their own hallucinations, at a rate significantly better than chance. So there must be some sort of "do I recall this?" mechanism built into them. Even if it's not exactly a reliable mechanism. Anthropic has discovered that this is definitely the case for name recognition, and I suspect that names aren't the only things subject to a process like that.