Remix.run Logo
adastra22 an hour ago

They do know what they don't know. There's a probability distribution for outputs that they are sampling from. That just isn't being used for that purpose.

dhampi 26 minutes ago | parent | next [-]

Well, with thinking models, it’s not that simple. The probability distribution is next token. But if a model thinks to produce an answer, you can have a high confidence next token even if MCMC sampling the model’s thinking chain would reveal that the real probability distribution had low confidence.

raddan an hour ago | parent | prev | next [-]

I’m not clear what you mean by “know.” If you mean “the information is in the model” then I mostly agree, distributional information is represented somewhere. But if you mean that a model can actually access this information in a meaningful and accurate way—say, to state its confidence level—I don’t think that’s true. There is a stochastic process sampling from those distributions, but can the process introspect? That would be a very surprising capability.

kneyed 41 minutes ago | parent [-]

yes:

> In this experiment, however, the model recognizes the injection before even mentioning the concept, indicating that its recognition took place internally.

https://www.anthropic.com/research/introspection

Isamu an hour ago | parent | prev [-]

Oh, you mean somewhere it is tracking the statistical likelihood of the output. Yeah I buy that, although I think it just tends towards the most likely output given the context that it is dragging along. I mean it wouldn’t deliberately choose something really statistically unlikely, that’s like a non sequitur.

adastra22 3 minutes ago | parent | next [-]

Well, it's not tracking. As it predicts each token it is sampling from a probability distribution -- that's what the matrix multiplies are for. It gets a distribution over all tokens and then picks randomly according to that distribution. How flat or how spiky that distribution is tells you how confident it is in its answer.

But it then throws that distribution away / consumes it in the next token calculation. So it's not really tracking it per se.

tempest_ an hour ago | parent | prev [-]

From its point of view what does it mean "to know".

Is it the token (or set of tokens) that are strictly > 50% probable or is it just the highest probability in a set of probabilities?

While generating bullshit is not ideal for a lot of use cases you don't want your premier chat bot to say "I don't know" to the general public half the time. The investment in these things requires wide adoption so they are always going to favour the "guesses".