| ▲ | Someone 2 hours ago | |
For those questions, it wouldn’t surprise me at all if five well-educated intelligent humans disagreed on over two out of three of them. I would answer “don’t know” on many, but that’s not an option. | ||
| ▲ | kostaj 2 hours ago | parent [-] | |
Yes, inter-human-annotator disagreement is also high on similar type of questions (AVeriTeC) - inter-panel agreement: κ=0.619. Tried giving the models a fifth option, Abstain, but some models seem to use it to "avoid answering hard questions" more than others. | ||