There is a fundamental assumption made about the ability of AI here that I believe is wrong.

It assumes that the outputs are lacking because of a limit of ability.

I think there is a strong case to make that many of their limitations come from them doing what we have told them to do. Hallucinations are the stand out example of this. If you train it to give answers to questions, it will answer questions, but it might have to make up the answer to do so. This isn't not knowing that it does not know. This is doing the task given to it regardless of whether it knows or not.

If you were given the task of writing the script for a TV show with the criteria that it not offend any people whatsoever. You are told to make something that is as likeable as you can make it without anyone not-liking it at all. The options for what you can do are reduced to something that is okay-ish but rather bland.

That's what AI is giving us. OK but rather bland. It's giving it to us because that's what we've told it we want.

▲

andsoitis 2 hours ago | parent [-]

> I think there is a strong case to make that many of their limitations come from them doing what we have told them to do. Hallucinations are the stand out example of this. If you train it to give answers to questions, it will answer questions, but it might have to make up the answer to do so. This isn't not knowing that it does not know. This is doing the task given to it regardless of whether it knows or not.

Are you asserting that an LLM could be NOT trained to answer when it knows it doesn’t know the answer, or if that’s not possible be trained to NOT answer when it knows it doesn’t know the answer?

If so, I would believe your thinking, but for some reason I have not yet seen a single LLM that behaves with that kind of self-knowledge.

▲

Lerc 2 hours ago | parent | next [-]

It should be trained to answer when it knows the answer, and to state that it does not know the answer when it does not. They might already have a very good understanding of not knowing internally, but are just not trained to express that.

This is not a problem in the ability of the system, it is a problem of how to construct training for such a task.

To provide training examples where it answers it does not know the answer only when it does not know the answer. You need training examples where it says it doesn't know when it does not contain that knowledge, but it provides an answer when it does know the answer.

To create such an example, you need to know in advance what the model knows and what the model does not know. You can't just have a database of facts that it knows, because you also need to count things that it can readily infer.

Any model that can reliably give the sum of any two 10 digit integers should be able to answer so. You can't list every possible number that a model knows how to add. That is just the tiniest subset of the task you would have to do because you have to determine every inferrable fact, not just integers. Adding to the problem is that training on questions like this can add to the knowledge base to the model either from the question itself or by inductively figuring out the answer based upon the combination of the question and the fact that it was not expected to know the answer.

A completely different training system would have to be implemented. There is research on categorising patterns of activations that can determine a form of 'mental state' of a model. A dynamic training approach where the answer that the model is expected-to-give/rewarded-for-giving is partially dependent on the models own state could be achieved through this mechanism.

	▲	bluefirebrand 43 minutes ago \| parent [-]
		> It should be trained to answer when it knows the answer, and to state that it does not know the answer when it does not Do LLMs even have any kind of internal model of what they know or don't know? My understanding is that they don't.

▲

lobofta 2 hours ago | parent | prev [-]

Of course it's possible.

I don't say this, because I know how, but because I see no reason why we will be unable to crack that problem. If our brains can do it, so will AI one day.