| ▲ | gcr an hour ago | |
Shouldn't that be part of the test? Real-world systems need to be able to say "I don't know." This is a test about misinformation after all, and overconfident responses contribute to that. Teasing out the difference between "avoid" and "unknown" could be a different research question | ||