I think it's too early to declare the Turing test passed. You just need to have a conversation long enough to exhaust the context window. Less than that, since response quality degrades long before you hit hard window limits. Even with compaction.

Neuroplasticity is hard to simulate in a few hundred thousand tokens.

▲

zug_zug an hour ago | parent | next [-]

"You're absolutely right!"

I think for a while the test was passed. Then we learned the hallmark characteristics of these models, and now most of us can easily differentiate. That said -- these models are programmed specifically to be more helpful, more articulate, more friendly, and more verbose than people, so that may not be a fair expectation. Even so, I think if you took all of that away, you'd be able to differentiate the two, it just might take longer.

	▲	drob518 an hour ago \| parent [-]
		Right. I think the modern LLMs are quite good at mimicking human words, but we were initially taken in like we were in the 1960s by ELIZA. It’s a (increasingly sophisticated) magic trick, but it’s just a trick.

▲

downboots 2 hours ago | parent | prev | next [-]

It was not meant as a pass/fail

▲

criley2 2 hours ago | parent | prev [-]

For as rigorous of a Turing test as you present, I believe many (or even most) humans would also fail it.

How many humans seriously have the attention span to have a million "token" conversation with someone else and get every detail perfect without misremembering a single thing?

	▲	nine_k 2 hours ago \| parent \| next [-]
		But context window exhaustion does not look like mere forgetfulness, but more like loss of general coherence, like getting drunk.
	▲	stickfigure 2 hours ago \| parent \| prev \| next [-]
		Response quality degrades long before you hit a million tokens. But sure, let's say it doesn't. If you interact with someone day after day, you'll eventually hit a million tokens. Add some audio or images and you will exhaust the context much much faster. However, I'll grant you that Turing's original imitation game (text only, human typist, five minutes) is probably pretty close, and that's impressive enough to call intelligence (of a sort). Though modern LLMs tend to manifest obvious dead giveaways like "you're absolutely right!"
	▲	dairem 2 hours ago \| parent \| prev [-]
		Doesn't the Turing test require a human too, to be compared to the AI?