Saying “I don’t know” to 30% of queries if it actually doesn’t know, is a feature I want. Otherwise there is zero trust. How do I know that I’m in a 30% wrong or 70% correct situation right now?

▲

nunez 2 days ago | parent | next [-]

The paper does a good job explaining why this is mathematically not possible unless the question-answer bank is a fixed set.

▲

smallmancontrov 2 days ago | parent [-]

Quite the opposite: it explains that it is mathematically straightforward to achieve better alignment on uncertainty ("calibration") but that leaderboards penalize it.

> This “epidemic” of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards

Even more embarrassing, it looks like this is something we beat into models rather than something we can't beat out of them:

> empirical studies (Fig. 2) show that base models are often found to be calibrated, in contrast to post-trained models

That said, I generally appreciate fairly strong bias-to-action and I find the fact that it got slightly overcooked less offensive than the alternative of an undercooked bias-to-action where the model studiously avoids doing anything useful in favor of "it depends" + three plausible reasons why.

▲

baq 2 days ago | parent [-]

> leaderboards penalize it

> socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards

Sounds more like we need new leaderboards and old ones should be deprecated

	▲	smallmancontrov 2 days ago \| parent [-]
		Yeah, it's a big enough lift that I think it's fair to allow the leaderboard teams new announcements and buzzwords in exchange for doing the work :-)

▲

jeremyjh 2 days ago | parent | prev [-]

It doesn’t know what it doesn’t know.

▲

fallpeak 2 days ago | parent | next [-]

It doesn't know that because it wasn't trained on any tasks that required it to develop that understanding. There's no fundamental reason an LLM couldn't learn "what it knows" in parallel with the things it knows, given a suitable reward function during training.

▲

binarymax 2 days ago | parent | prev | next [-]

Well sure. But maybe the token logprobs can be used to help give a confidence assessment.

	▲	tyre 2 days ago \| parent [-]
		Anthropic has a great paper on exactly this! https://www.anthropic.com/research/language-models-mostly-kn... The best is its plummeting confidence when beginning the answer to “Why are you alive?” Big same, Claude.

▲

smt88 2 days ago | parent | prev [-]

That's not true for all types of questions. You've likely seen a model decline to answer a question that requires more recent training data than it has, for example.