Is the model thinking what is cat doing here? Then start thinking it is being tested?

lawlessone 4 days ago | parent | next [-]

Even if the model "ignores" it. Won't the presence of the irrelevant text alter the probability of its output in some way?

▲

wongarsu 4 days ago | parent | prev | next [-]

I have no clue what the model is thinking, and as far as I can tell the paper also makes no attempt at answering that. It's also not really the point, the point is more that the claim in the paper that humans would be unaffected is unsubstantiated and highly suspect. I'd even say more likely wrong than right

▲

xienze 4 days ago | parent | next [-]

> It's also not really the point, the point is more that the claim in the paper that humans would be unaffected is unsubstantiated and highly suspect.

I think the question that adds a random cat factoid at the end is going to trip up a lot fewer humans than you think. At the very least, they could attempt to tell you after the fact why they thought it was relevant.

And ignoring that, obviously we should be holding these LLMs to a higher standard than “human with extraordinary intelligence and encyclopedic knowledge that can get tripped up by a few irrelevant words in a prompt.” Like, that should _never_ happen if these tools are what they’re claimed to be.

	▲	lawlessone 4 days ago \| parent [-]
		I'm sure humans would be affected in some way. But not al all the same way an LLM would. A human would probably note it as a trick in their reply. The way LLMs work it could bias their replies in weird ways by changing their replies in unexpected ways beyond seeing it as a trick.

▲

cantor_S_drug 4 days ago | parent | prev [-]

They should prompt the model to ignore irrelevant information and test if the model performs better and is good at ignoring those statements?

▲

Detrytus 3 days ago | parent | prev [-]

I wonder if the problem here is simply hitting some internal quota on compute resources? Like, if you send the model on wild goose chase with irrelevant information it wastes enough compute time on it that it fails to arrive at correct answer to main question.

	▲	cantor_S_drug 3 days ago \| parent [-]
		Possibly. But could indicate that initial tokens set the direction or the path model could go down into. Just like when a person mentions two distinct topics in conversation nearby, the listener decides which topic to continue with.