If asked verbally that would absolutely confuse some humans. Easily enough to triple the error rate for that specific question (granted, that's easier than the actual questions, but still). Even in a written test with time pressure it would probably still have a statistically significant effect

▲

kazinator 4 days ago | parent | next [-]

The problem with your reasoning is that some humans cannot solve the problem even without the irrelevant info about cats.

We can easily cherry pick our humans to fit any hypothesis about humans, because there are dumb humans.

The issue is that AI models which, on the surface, appear to be similar to the smarter quantile of humans in solving certain problems, become confused in ways that humans in that problem-solving class would not be.

That's obviously because the language model is not generally intelligent it's just retrieving tokens from a high-dimensional statistically fit function. The extra info injects noise into the calculation which confounds it.

▲

krisoft 4 days ago | parent | next [-]

> We can easily cherry pick our humans to fit any hypothesis about humans, because there are dumb humans.

Nah. You would take a large number of humans, make half of them take the test with distractions and half without distracting statements and then you would compare their results statistically. Yes there would be some dumb ones, but as long as you test on enough people they would show up in both samples rougly at the same rate.

> become confused in ways that humans in that problem-solving class would not be.

You just state the same thing others are disputing. Do you think it will suddenly become convincing if you write it down a few more times?

▲

Kuinox 4 days ago | parent | prev [-]

That's obviously because the brain is not generally intelligent it's just retrieving concepts from a high-dimensional statistically fit function. The extra info injects noise into the calculation which confounds it.

▲

kazinator 4 days ago | parent | next [-]

The problem with your low-effort retort is that, for example, the brain can wield language without having to scan anywhere near hundreds of terabytes of text. People acquire language from vastly fewer examples, and are able to infer/postulate rules, and articulate the rules.

We don't know how.

While there may be activity going on in the brain interpretable as high-dimensional functions mapping inputs to outputs, you are not doing everything with just one fixed function evaluating static weights from a feed-forward network.

If it is like neural nets, it might be something like numerous models of different types, dynamically evolving and interacting.

▲

Kuinox 3 days ago | parent [-]

The problem with your answer is that you make affirmations using logical fallacies. We both don't know how LLMs, and brains works to produce output. Any affirmation toward that without proof is affirming things without any basis.

For example in this response: > the brain can wield language without having to scan anywhere near hundreds of terabytes of text.

The amount of text we need to train an LLM only goes down, even 2 years ago it was showed you need less than a few millions words: https://tallinzen.net/media/papers/mueller_linzen_2023_acl.p... , in order to "acquire" english.

▲

kazinator 3 days ago | parent [-]

Training the weights of the neural network produces a humungous function with a vast number of parameters.

Such a function is not inherently mysterious due to the size alone. For instance, if we fit a billion numeric points to a polynomial curve having a billion coefficients, we would not be mystified as to how the polynomial interpolates between the points.

Be that as it may, the trained neural network function does have mysterious properties, that is true.

But that doesn't mean we don't know how it works. We invented it and produced it by training.

To say that we completely don't understand it is like saying we don't understand thermodynamic because the laws of thermodynamic don't allow us to predict the path of a particle of gas in, and so we must remain mystified as to how the gas can take on the shape of the container.

Say we train a neural network to recognize digit characters. Of course we know why it produces the answer 3 when given any one of our training images of 3: we iterated on bumping the weights until it did that. When we give it a an image of 3 not in our training set and it produces some answer (either correctly 3 or something disappointing) we are less sure. We don't exactly know the exact properties of the multi-dimensional function which encode the "threeness" of the image.

Sure; so what? It's a heck of a lot more than we know about how a person recognizes a 3, where we had no design input, and don't even know the complete details of the architecture. We don't have a complete model of just one neuron, whereas we do have a complete model of a floating-point number.

Gas in a container is a kind of brain which figures out how to mimic the shape of the container using a function of a vast number of parameters governing the motion of particles. Should we be mystified and declare that we don't understand the thermodynamic laws we came up with because they don't track the path taken by a particle of gas, and don't explain how every particle "knows" where it is supposed to be so that the gas takes on the shape of the cylinder, and has equal pressure everywhere?

	▲	Kuinox 2 days ago \| parent [-]
		> we would not be mystified as to how the polynomial interpolates between the points. We would not be surprised - we wouldnt know how the model resolve the problem. We wouldn't know if it is approximating, calculating the correct value, or memorising result. We would only know how it was built. We would be mystified in how it solved the problem. > But that doesn't mean we don't know how it works. We invented it and produced it by training. It is not because it was invented that we know how it works. The fallacy in your reasoning is thinking that Emergent Behavior or Properties can be trivialy explained, by knowing it's building block.

▲

const_cast 4 days ago | parent | prev [-]

Yes, how... obvious?

I don't know, do we even know how the brain works? Like, definitively? Because I'm pretty sure we don't.

▲

Kuinox 3 days ago | parent [-]

Yeah we don't, that's one of the point of my reply, we don't know how LLMs works either.

	▲	3 days ago \| parent [-]
		[deleted]

▲

cantor_S_drug 4 days ago | parent | prev | next [-]

Is the model thinking what is cat doing here? Then start thinking it is being tested?

▲

lawlessone 4 days ago | parent | next [-]

Even if the model "ignores" it. Won't the presence of the irrelevant text alter the probability of its output in some way?

▲

wongarsu 4 days ago | parent | prev | next [-]

I have no clue what the model is thinking, and as far as I can tell the paper also makes no attempt at answering that. It's also not really the point, the point is more that the claim in the paper that humans would be unaffected is unsubstantiated and highly suspect. I'd even say more likely wrong than right

▲

xienze 4 days ago | parent | next [-]

> It's also not really the point, the point is more that the claim in the paper that humans would be unaffected is unsubstantiated and highly suspect.

I think the question that adds a random cat factoid at the end is going to trip up a lot fewer humans than you think. At the very least, they could attempt to tell you after the fact why they thought it was relevant.

And ignoring that, obviously we should be holding these LLMs to a higher standard than “human with extraordinary intelligence and encyclopedic knowledge that can get tripped up by a few irrelevant words in a prompt.” Like, that should _never_ happen if these tools are what they’re claimed to be.

	▲	lawlessone 4 days ago \| parent [-]
		I'm sure humans would be affected in some way. But not al all the same way an LLM would. A human would probably note it as a trick in their reply. The way LLMs work it could bias their replies in weird ways by changing their replies in unexpected ways beyond seeing it as a trick.

▲

cantor_S_drug 4 days ago | parent | prev [-]

They should prompt the model to ignore irrelevant information and test if the model performs better and is good at ignoring those statements?

▲

Detrytus 3 days ago | parent | prev [-]

I wonder if the problem here is simply hitting some internal quota on compute resources? Like, if you send the model on wild goose chase with irrelevant information it wastes enough compute time on it that it fails to arrive at correct answer to main question.

	▲	cantor_S_drug 3 days ago \| parent [-]
		Possibly. But could indicate that initial tokens set the direction or the path model could go down into. Just like when a person mentions two distinct topics in conversation nearby, the listener decides which topic to continue with.

▲

lawlessone 4 days ago | parent | prev [-]

a human would immediately identify it as a trick.