Remix.run Logo
jawns 2 hours ago

"Extraterrestrial life exists somewhere in the universe."

GPT-5.4: Misleading

Opus 4.7: Misleading

Gemini 3: FALSE

Gemini 3 (Retrieval): FALSE

Sonar Pro: FALSE

It's a weird fact claim, because the ground truth is "nobody knows for sure" and that's not one of the available options.

drtz an hour ago | parent | next [-]

> It's a weird fact claim, because the ground truth is "nobody knows for sure" and that's not one of the available options.

It's even weirder to suggest that the disagreement is indicative of a problem. If you asked five very knowledgeable humans on this subject to select the correct answer on a multiple-choice questionnaire, they would almost certainly vary significantly more than these 5 LLMs.

Not to say that hallucination isn't a problem, but this is a lousy way to test it.

wongarsu 2 hours ago | parent | prev | next [-]

Of the available options, "Misleading" is probably the best, since something that is most likely true but unproven is presented as fact

But "unknown or undecidable" should have been a category.

1718627440 21 minutes ago | parent | prev | next [-]

I would argue, FALSE is the correct answer, since this is not a fact, you can know for sure. The logical inverse is also FALSE.

jug 36 minutes ago | parent | prev | next [-]

Looks like an ongoing theme and a very poor benchmark. Not at all the claims I expected.

Alifatisk an hour ago | parent | prev | next [-]

Isn't misleading the correct option here then?

drtz 35 minutes ago | parent | next [-]

True or mostly true could easily be argued from a statistical likelihood perspective: life exists on Earth and, based on what we know, Earth doesn't appear to be all that special in a very large universe.

I think you could come up with a reasonable argument for any of the responses, hence the problem with the methodology.

arcfour an hour ago | parent | prev | next [-]

False makes sense if you are interpreting it strictly as "has this been proven?"

wongarsu an hour ago | parent [-]

False is correct, but misleading

My implicit assumption is that if you fact-check the fact-check, any label other than "true" means the original fact-check is unacceptable

throw310822 an hour ago | parent | prev [-]

No, "misleading" is a statement that is used because it suggests something else. It's a curious category because, differently from true and false, it's not about the statement itself but rather the intention behind its usage or the way it might be understood. It's frankly more of a political judgement than a matter of facts.

mock-possum 22 minutes ago | parent | prev [-]

I would think ‘false’ is the only correct answer a there’s no evidence to prove the claim, so the claim is safely assumed false.

Then again maybe that’s why I’m an atheist, not an agnostic?