| ▲ | adastra22 an hour ago | |||||||||||||
They do know what they don't know. There's a probability distribution for outputs that they are sampling from. That just isn't being used for that purpose. | ||||||||||||||
| ▲ | dhampi 26 minutes ago | parent | next [-] | |||||||||||||
Well, with thinking models, it’s not that simple. The probability distribution is next token. But if a model thinks to produce an answer, you can have a high confidence next token even if MCMC sampling the model’s thinking chain would reveal that the real probability distribution had low confidence. | ||||||||||||||
| ▲ | raddan an hour ago | parent | prev | next [-] | |||||||||||||
I’m not clear what you mean by “know.” If you mean “the information is in the model” then I mostly agree, distributional information is represented somewhere. But if you mean that a model can actually access this information in a meaningful and accurate way—say, to state its confidence level—I don’t think that’s true. There is a stochastic process sampling from those distributions, but can the process introspect? That would be a very surprising capability. | ||||||||||||||
| ||||||||||||||
| ▲ | Isamu an hour ago | parent | prev [-] | |||||||||||||
Oh, you mean somewhere it is tracking the statistical likelihood of the output. Yeah I buy that, although I think it just tends towards the most likely output given the context that it is dragging along. I mean it wouldn’t deliberately choose something really statistically unlikely, that’s like a non sequitur. | ||||||||||||||
| ||||||||||||||