Remix clone Hacker News

new | show | ask | jobs Github

	▲	Someone 2 hours ago
		For those questions, it wouldn’t surprise me at all if five well-educated intelligent humans disagreed on over two out of three of them. I would answer “don’t know” on many, but that’s not an option.
	▲	kostaj 2 hours ago \| parent [-]
		Yes, inter-human-annotator disagreement is also high on similar type of questions (AVeriTeC) - inter-panel agreement: κ=0.619. Tried giving the models a fifth option, Abstain, but some models seem to use it to "avoid answering hard questions" more than others.