▲ | thatjoeoverthr 21 hours ago | |||||||
There are a few problems with an „I don’t know” sample. For starters, what does it map to? Recall, the corpus consists of information we have (affirmatively). You would need to invent a corpus of false stimuli. What you would have, then, is a model that is writing „I don’t know” based on whether the stimulus better matches something real, or one of the negatives. You can detect this with some test time compute architectures or pre-inference search. But that’s the broader application. This is a trick for the model alone. | ||||||||
▲ | dlivingston 14 hours ago | parent [-] | |||||||
The Chain of Thought in the reasoning models (o3, R1, ...) will actually express some self-doubt and backtrack on ideas. That tells me there's a least some capability for self-doubt in LLMs. | ||||||||
|