| ▲ | jawns 2 hours ago | ||||||||||||||||||||||||||||
"Extraterrestrial life exists somewhere in the universe." GPT-5.4: Misleading Opus 4.7: Misleading Gemini 3: FALSE Gemini 3 (Retrieval): FALSE Sonar Pro: FALSE It's a weird fact claim, because the ground truth is "nobody knows for sure" and that's not one of the available options. | |||||||||||||||||||||||||||||
| ▲ | drtz an hour ago | parent | next [-] | ||||||||||||||||||||||||||||
> It's a weird fact claim, because the ground truth is "nobody knows for sure" and that's not one of the available options. It's even weirder to suggest that the disagreement is indicative of a problem. If you asked five very knowledgeable humans on this subject to select the correct answer on a multiple-choice questionnaire, they would almost certainly vary significantly more than these 5 LLMs. Not to say that hallucination isn't a problem, but this is a lousy way to test it. | |||||||||||||||||||||||||||||
| ▲ | wongarsu 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Of the available options, "Misleading" is probably the best, since something that is most likely true but unproven is presented as fact But "unknown or undecidable" should have been a category. | |||||||||||||||||||||||||||||
| ▲ | 1718627440 21 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
I would argue, FALSE is the correct answer, since this is not a fact, you can know for sure. The logical inverse is also FALSE. | |||||||||||||||||||||||||||||
| ▲ | jug 36 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Looks like an ongoing theme and a very poor benchmark. Not at all the claims I expected. | |||||||||||||||||||||||||||||
| ▲ | Alifatisk an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Isn't misleading the correct option here then? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | mock-possum 22 minutes ago | parent | prev [-] | ||||||||||||||||||||||||||||
I would think ‘false’ is the only correct answer a there’s no evidence to prove the claim, so the claim is safely assumed false. Then again maybe that’s why I’m an atheist, not an agnostic? | |||||||||||||||||||||||||||||