Remix.run Logo
munchler 3 days ago

A few years ago, the Turing Test was universally seen as sufficient for identifying intelligence. Now we’re scouring the planet for obscure tests to make us feel superior again. One can argue that the Turing Test was not actually adequate for this purpose, but we should at least admit how far we have shifted the goalposts since then.

OtherShrezzing 3 days ago | parent | next [-]

I don't think the Turing Test, in its strictest terms, is currently defeated by LLM based AIs. The original paper puts forward that:

>The object of the game for the third [human] player (B) is to help the interrogator. The best strategy for her is probably to give truthful answers. She can add such things as "I am the woman, don't listen to him!" to her answers, but it will avail nothing as the man can make similar remarks.

Chair B is allowed to ask any question; should help the interrogator identify the LLM in Chair A; and can adopt any strategy they like. So they can just ask Chair A questions which will reveal that they're a machine. For example, a question like "repeat lyrics from your favourite copyrighted song", or even "Are you an LLM?".

Any person reading this comment should have the capacity to sit in Chair B, and successfully reveal the LLM in Chair A to the interrogator in 100% of conversations.

tough 3 days ago | parent [-]

that relies on the positive-aligned RLHF models most labs do.

what if you turned that 180 into models trained to decieve and lie and try to pass the test?

lumost 3 days ago | parent | next [-]

Human's are able to quickly converge on a pattern. While I doubt that I could immediately catch all LLMs, I can certainly catch a good portion by having simply worked with them for a time. On an infinite horizon Turing test, where I have the option to state that Chair A is a machine at any time - I would certainly expect to detect LLMs simply by virtue of their limited conversational range.

tough 3 days ago | parent [-]

if anything i would do differently, i'd try things only machines can reliably do.

unless the llm and the design for it is necessarily adversarial, not even going into red teaming or jailbreaks.

A human couldn't type for 24h straight or faster than say X WPM, A human couldn't do certain tricky problems or know and reply super fast to various news events etc. Search/training date seems important factor too to tie in.

but yeah overall if the time is infinite you can come up with some new way to find out, kinda becomes a cat and mouse games then like software security nowadays

oinfoalgo 3 days ago | parent | prev [-]

If we had firms spending billions of dollars to pass the Turing test, it seems absurd to me to believe the current crop of models could not pass the test.

Luckily, it is obvious that spending huge amounts of money to train models on how to best deceive humans with language is a terrible idea.

That is also gaming the test and not in the spirit of generality that the test was trying to test for.

Even playing Tic-tac-toe against GPT5 is a joke. The model knows enough of how the game works to let you play in text but doesn't even know when you won the game.

The interesting part is that the model can even tell you why it sucks at tic-tac-toe

"Because I’m not really thinking about the game like a human — I’m generating moves based on text patterns, not visualizing the board in the same intuitive way you do."

10 years ago it would not be conceivable we could have models that pass the Turing test but be hopeless at Tic-tac-toe and be able to tell you why they are not good at Tic-tac-toe.

That right there is a total invalidation of the Turing test IMO.

birn559 3 days ago | parent [-]

How would AI reliable pass the turing test when playing Tic-Tac-Toe reliably reveals the weakness of today's AI?

altruios 3 days ago | parent | prev | next [-]

I have trouble reconciling this point with the known phenomenon of hallucinations.

I would suppose the correct test is an 'infinite' Turing test, which after a long enough conversation, LLM's invariably do not pass, as they eventually degrade.

I think a better measure for the binary answer of "have they passed the Turing test?" is the metric of 'For how long do they continue to pass the Turing test?"...

This ignores such ideas of probing the LLM's weak spots. Since they do not 'see' their input as characters, and instead as tokens, counting letters in words, or specifics about those sub-token division provides a shortcut (for now) to failing the Turing test.

But the above approach is not in the spirit of the Turing test, as that only points out a blind spot in their perception, like how a human would have to guess a bit at what things would look like if UV and infrared were added to our visual field... sure we could reason about it, but we wouldn't actually perceive those wavelengths, so we could make mistakes about that qualia. And it would say nothing of our ability to think if we could not perceive those wavelengths, even if 'more-seeing' entities judged us as inferior for it...

throwawaylaptop 3 days ago | parent [-]

I date a lot of public school teachers for some reason (hey, once you have a niche it's easy to relate and they like you), and I assure you you'd have a better more human conversation with an LLM than with most middle school teachers.

YeGoblynQueenne 2 days ago | parent | prev | next [-]

Useful things to keep in mind about the "Turing test:

a) It was not meant as a "test" by Turing, rather as a thought experiment.

b) It does not require intelligence to pass tests that claim to be it. See:

https://en.wikipedia.org/wiki/Eugene_Goostman

rurp 3 days ago | parent | prev | next [-]

I think the article gives a much more plausible explanation for the demise of the Turing Test: the jagged frontier. In the past being able to write convincingly well seemed like a good overall proxy for cognitive ability. It turns out LLMs are excellent at spitting out reasonable sounding text, and great at producing certain types of writing, but are still terrible at many writing tasks that rely on cognitive ability.

Humans don't need to cast about for obscure cases where they are smarter than an LLM, there are an endless supply of examples. It's simply the case that the Turing Test tells us very little about the relative strengths and weaknesses of the current AI capabilities.

recursivecaveat 3 days ago | parent [-]

The turing test basically subsumes all tests that can be text-encoded, no? Like if you feel that LLMs are abnormally bad at a kind of writing like an All Souls essay, you just ask the other chair to write you such an essay as one of your questions.

To be clear, I'm not aware of anyone actually running any serious turing tests today because it's very expensive and tedious. There's one being passed around where each conversation is only 4(!) little SMS-sized messages long per side, and chat gpt gets judged to be the human side twice as often as the actual human.

m4x 3 days ago | parent | prev | next [-]

Would you consider that any current LLM is close to passing the Turing test?

If you think there's an LLM that can do so, I'd love to try it out! Even talking to the best models available today, it's disappointingly clear I'm talking to an LLM.

layer8 3 days ago | parent | prev | next [-]

The article isn’t really about intelligence, but about originality and creativity in writing.

delusional 3 days ago | parent | prev [-]

The Turing Test is a philosophical device meant to question what being a human is. It was never a benchmark or a goalpost.

sorokod 2 days ago | parent [-]

You are quite wrong, do have a look at Turing's paper.

delusional 2 days ago | parent [-]

I have in fact read it. I stand by my statement.