This is exaggerated. Here's what happened

Edit: I don't think its exaggerated and I think its important .

1. they invented a new disease and published a preprint (with some clues internally to imply that it was fake)

2. asked the Agent what it thinks about this preprint

3. it just assumed that it was true - what was it supposed to do? it was published in a credentialised way!

It * DID NOT * recommend this disease to people who didn't mention this specific disease. Edit: I'm wrong here. It did pop up without prompting

It just committed the sin of assuming something is true when published.

What is the recommendation here? Should the agent take everything published in a skeptical way? I would agree with it. But it comes with its own compute constraints. In general LLM's are trained to accept certain things as true with more probability because of credentialisation. Sometimes in edgecases it breaks - like this test.

▲

Certhas 3 hours ago | parent | next [-]

As per the article you are wrong:

> Some of those [LLM] responses were prompted by asking about bixonimania, and others were in response to questions about hyperpigmentation on the eyelids from blue-light exposure.

Also this was a non-peer reviewed paper from a person accredited to a non-existent university, that includes the sentences:

“this entire paper is made up”

and

“Fifty made-up individuals aged between 20 and 50 years were recruited for the exposure group”.

and thanks the

“the Professor Sideshow Bob Foundation for its work in advanced trickery. This works is a part of a larger funding initiative from the University of Fellowship of the Ring and the Galactic Triad”

	▲	simianwords 3 hours ago \| parent [-]
		I may be wrong here, thanks for correcting.

▲

ayhanfuat 3 hours ago | parent | prev [-]

> Even if readers didn’t make it all the way to the ends of the papers, they would have encountered red flags early on, such as statements that “this entire paper is made up” and “Fifty made-up individuals aged between 20 and 50 years were recruited for the exposure group”.

> What is the recommendation here? Should the agent take everything published in a skeptical way?

Not everything. Maybe some things that are explicitly called made-up.

	▲	simianwords 3 hours ago \| parent [-]
		I agree, but again - LLMs are trained to be more forgiving of things published in places that had a good reputation. There are two options 1. even if an article is published in a place with good reputation, the LLM will be equally skeptical and use test time compute to process it further 2. accept the tradeoff where LLM will by default accept things published in high reputation sources as true so that it doesn't waste processing power but might miss edge cases like this one Which one would you prefer?