▲ | wongarsu 4 days ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If asked verbally that would absolutely confuse some humans. Easily enough to triple the error rate for that specific question (granted, that's easier than the actual questions, but still). Even in a written test with time pressure it would probably still have a statistically significant effect | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | kazinator 4 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The problem with your reasoning is that some humans cannot solve the problem even without the irrelevant info about cats. We can easily cherry pick our humans to fit any hypothesis about humans, because there are dumb humans. The issue is that AI models which, on the surface, appear to be similar to the smarter quantile of humans in solving certain problems, become confused in ways that humans in that problem-solving class would not be. That's obviously because the language model is not generally intelligent it's just retrieving tokens from a high-dimensional statistically fit function. The extra info injects noise into the calculation which confounds it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | cantor_S_drug 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Is the model thinking what is cat doing here? Then start thinking it is being tested? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | lawlessone 4 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
a human would immediately identify it as a trick. |