Remix.run Logo
Flere-Imsaho 10 hours ago

Models should not have memorised whether animals are kosher to eat or not. This is information that should be retrieved from RAG or whatever.

If a model responded with "I don't know the answer to that", then that would be far more useful. Is anyone actually working on models that are trained to admit not knowing an answer to everything?

spmurrayzzz 10 hours ago | parent | next [-]

There is an older paper on something related to this [1], where the model outputs reflection tokens that either trigger retrieval or critique steps. The idea is that the model recognizes that it needs to fetch some grounding subsequent to generating some factual content. Then it reviews what it previously generated with the retrieved grounding.

The problem with this approach is that it does not generalize well at all out of distribution. I'm not aware of any follow up to this, but I do think it's an interesting area of research nonetheless.

[1] https://arxiv.org/abs/2310.11511

robrenaud 4 hours ago | parent | prev | next [-]

Benchmarks need to change.

There is a 4 choice choice question. Your best guess is the answer is B, at about 35% chance of being right. If you are graded on fraction of questions answered correctedly, the optimization pressure is simply to answer B.

If you could get half credit for answering "I don't know", we'd have a lot more models saying that when they are not confident.

anonym29 5 hours ago | parent | prev | next [-]

>Models should not have memorised whether animals are kosher to eat or not.

Agreed. Humans do not perform rote memorization for all possibilities of rules-based classifications like "kosher or not kosher".

>This is information that should be retrieved from RAG or whatever.

Firm disagreement here. An intelligent model should either know (general model) or RAG-retrieve (non-general model) the criteria for evaluating whether an animal is kosher or not, and infer based on knowledge of the animal (either general model, or RAG-retrieval for a non-general model) whether or not the animal matches the criteria.

>If a model responded with "I don't know the answer to that", then that would be far more useful.

Again, firm disagreement here. "I don't know" is not a useful answer to a question that can be easily answered by cross-referencing easily-verifiable animal properties against the classification rules. At the very least, an intelligent model should explain which piece of information it is missing (properties of the animal in question OR the details of the classification rules), rather than returning a zero-value response.

To wit: if you were conducting an interview for a developer candidate, and you asked them whether Python supports functions, methods, both, or neither, would "I don't know" ever be an appropriate answer, even if the candidate genuinely didn't know off the top of their head? Of course not - you'd desire a candidate who didn't know to say something more along the lines of "I don't know, but here's what I would do to figure out the answer for you".

A plain and simple "I don't know" adds zero value to the conversation. While it doesn't necessarily add negative value to the conversation the way a confidently incorrect answer does, the goal for intelligent models should never be to produce zero value, it should be to produce nonzero positive value, even when it lacks required information.

2 hours ago | parent | prev [-]
[deleted]