| ▲ | Benjammer 4 hours ago |
| It always feels to me like these types of tests are being somewhat intentionally ignorant of how LLM cognition differs from human cognition. To me, they don't really "prove" or "show" anything other than simply - LLMs thinking works different than human thinking. I'm always curious if these tests have comprehensive prompts that inform the model about what's going on properly, or if they're designed to "trick" the LLM in a very human-cognition-centric flavor of "trick". Does the test instruction prompt tell it that it should be interpreting the image very, very literally, and that it should attempt to discard all previous knowledge of the subject before making its assessment of the question, etc.? Does it tell the model that some inputs may be designed to "trick" its reasoning, and to watch out for that specifically? More specifically, what is a successful outcome here to you? Simply returning the answer "5" with no other info, or back-and-forth, or anything else in the output context? What is your idea of the LLMs internal world-model in this case? Do you want it to successfully infer that you are being deceitful? Should it respond directly to the deceit? Should it take the deceit in "good faith" and operate as if that's the new reality? Something in between? To me, all of this is very unclear in terms of LLM prompting, it feels like there's tons of very human-like subtext involved and you're trying to show that LLMs can't handle subtext/deceit and then generalizing that to say LLMs have low cognitive abilities in a general sense? This doesn't seem like particularly useful or productive analysis to me, so I'm curious what the goal of these "tests" are for the people who write/perform/post them? |
|
| ▲ | majormajor 3 hours ago | parent | next [-] |
| The marketing of these products is intentionally ignorant of how LLM cognition differs from human cognition. Let's not say that the people being deceptive are the people who've spotted ways that that is untrue... |
|
| ▲ | an hour ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | biophysboy 4 hours ago | parent | prev | next [-] |
| I thought adversarial testing like this was a routine part of software engineering. He's checking to see how flexible it is. Maybe prompting would help, but it would be cool if it was more flexible. |
| |
| ▲ | genrader 34 minutes ago | parent | next [-] | | You're correct, however midwit people who don't actually fully understand all of this will latch on to one of the early difficult questions that was shown as an example, and then continued to use that over and over without really knowing what they're doing while the people developing the model and also testing the model are doing far more complex things | |
| ▲ | Benjammer 3 hours ago | parent | prev [-] | | So the idea is what? What's the successful outcome look like for this test, in your mind? What should good software do? Respond and say there are 5 legs? Or question what kind of dog this even is? Or get confused by a nonsensical picture that doesn't quite match the prompt in a confusing way? Should it understand the concept of a dog and be able to tell you that this isn't a real dog? | | |
| ▲ | biophysboy 2 hours ago | parent [-] | | No, it’s just a test case to demonstrate flexibility when faced with unusual circumstances |
|
|
|
| ▲ | runarberg 3 hours ago | parent | prev [-] |
| This is the first time I hear the term LLM cognition and I am horrified. LLMs don‘t have cognition. LLMs are a statistical inference machines which predict a given output given some input. There are no mental processes, no sensory information, and certainly no knowledge involved, only statistical reasoning, inference, interpolation, and prediction. Comparing the human mind to an LLM model is like comparing a rubber tire to a calf muscle, or a hydraulic system to the gravitational force. They belong in different categories and cannot be responsibly compared. When I see these tests, I presume they are made to demonstrate the limitation of this technology. This is both relevant and important that consumers know they are not dealing with magic, and are not being sold a lie (in a healthy economy a consumer protection agency should ideally do that for us; but here we are). |
| |
| ▲ | Benjammer 3 hours ago | parent | next [-] | | >They belong in different categories Categories of _what_, exactly? What word would you use to describe this "kind" of which LLMs and humans are two very different "categories"? I simply chose the word "cognition". I think you're getting hung up on semantics here a bit more than is reasonable. | | |
| ▲ | runarberg 2 hours ago | parent [-] | | > Categories of _what_, exactly? Precisely. At least apples and oranges are both fruits, and it makes sense to compare e.g. the sugar contents of each. But an LLM model and the human brain are as different as the wind and the sunshine. You cannot measure the windspeed of the sun and you cannot measure the UV index of the wind. Your choice of the words here was rather poor in my opinion. Statistical models do not have cognition any more than the wind has ultra-violet radiation. Cognition is a well studied phenomena, there is a whole field of science dedicated to cognition. And while cognition of animals are often modeled using statistics, statistical models in them selves do not have cognition. A much better word here would by “abilities”. That is that these tests demonstrate the different abilities of LLM models compared to human abilities (or even the abilities of traditional [specialized] models which often do pass these kinds of tests). Semantics often do matter, and what worries me is that these statistical models are being anthropomorphized way more then is healthy. People treat them like the crew of the Enterprise treated Data, when in fact they should be treated like the ship‘s computer. And I think this because of a deliberate (and malicious/consumer hostile) marketing campaign from the AI companies. | | |
| ▲ | Benjammer an hour ago | parent [-] | | Wind and sunshine are both types of weather, what are you talking about? | | |
| ▲ | runarberg an hour ago | parent [-] | | They both affect the weather, but in a totally different way, and by completely different means. Similarly the mechanisms in which the human brain produces output is completely different from the mechanism in which an LLM produces output. What I am trying to say is that the intrinsic properties of the brain and an LLM are completely different, even though the extrinsic properties might appear the same. This is also true of the wind and the sunshine. It is not unreasonable to (though I would disagree) that “cognition” is almost the definition of the sum of all intrinsic properties of the human mind (I would disagree only on the merit of animal and plant cognition existing and the former [probably] having similar intrinsic properties as human cognition). |
|
|
| |
| ▲ | CamperBob2 3 hours ago | parent | prev [-] | | You'll need to explain the IMO results, then. | | |
| ▲ | runarberg 2 hours ago | parent [-] | | Human legs and car tires can both take a human and a car respectively to the finish line of a 200 meter track course, the car tires do so considerably quicker than a pair of human legs. But nobody needs to describe the tire‘s running abilities because of that, nor even compare a tire to a leg. A car tire cannot run, and it is silly to demand an explanation for it. | | |
|
|