Remix.run Logo
killerstorm 6 days ago

"I don't know" is one of possible answers.

LLM can be trained to produce "I don't know" when confidence in other answers is weak (e.g. weak or mixed signals). Persona vector can also nudge it into that direction.

petesergeant 6 days ago | parent [-]

> LLM can be trained to produce "I don't know" when confidence in other answers is weak

I'm unaware of -- and would love to find some -- convincing studies showing that LLMs have any kind of internal confidence metric. The closest I've seen is reflective chain-of-thought after the fact, and then trying to use per-token selection scores, which is doomed to fail (see: https://vlmsarebiased.github.io/)