I'm not sure why you're ignoring aphyr's reports. I'm also unsure why you're ignoring my original statement that having the text of the conversation that lead ChatGPT to bullshit is entirely irrelevant, as being unable to repro the report is even worse for ChatGPT than being able to repro would be.

shrug

▲

simianwords 2 hours ago | parent [-]

I specified text just to ignore the voice one because it uses 4o-mini underneath. And its kinda stupid to keep ignoring that and saving face now - reconsider this approach.

I believe this is the 5th time I'm asking this: you are not able to produce a _single_ counter example for my challenge? After all this surely I can get a direct acknowledgement here.

▲

simoncion an hour ago | parent [-]

> you are not able to produce a _single_ counter example for my challenge?

I have. For both your original challenge and your updated one.

Consider:

1) AFAICT, there's no way to tell what version of the model was used to produce the output in a ChatGPT share link.

2) You don't appear to believe my assertions that aphyr is almost certainly paying for and using the latest version of the LLMs available, and that he's faithfully reporting his interactions with the LLMs.

3) Because of #2, I expect that you won't believe me if I report that I've more-or-less reproduced father_phi's results about the cup that's sealed on the top and open on the bottom on the very latest only-available-for-pay ChatGPT model.

3a) You might attempt to check my report, but I'd be shocked if you'd consider a failure to reproduce my results to be a significant strike against ChatGPT. I'd think it's more likely that you'd either call me a liar, or tell me that I must have had some setting wrong somewhere.

3b) Even if you told me to share the ChatGPT chat that proved my assertion, #1 -combined with your demeanor throughout this conversation- tells me that you'd almost certainly claim that I was using an inferior version of the model and was lying to you.

▲

simianwords an hour ago | parent [-]

Haha ok. So still no example?

The GPT shared link shows a "thought for" which indicates using the latest thinking model. You may try that.

What you can do is this: submit a prompt that clearly makes GPT hallucinate.

You may secretly use a worse model. You may use a system prompt that deliberately gives wrong answers. But I'm going to assume you won't go that far.

We can leave it to the public to decide whether this is a legitimate counter example or not and whether it can really be reproduced. Shall we try that? I'm guessing you won't but worth a shot!

▲

simoncion 14 minutes ago | parent [-]

You weren't paying much attention to the "Consider:" part of my previous comment.

You don't believe that a well-paid, very careful, high-integrity member of the computer safety community has -on multiple occasions- encountered actual, sustained bullshiting from the latest-available for-pay version of ChatGPT. You don't accept either this fellow's reports or my informed assessment of his computing situation as truthful and accurate. On top of that, your goalpost-shifting and general demeanor throughout this conversation simply don't give me the impression that you've much integrity. I'm not spending the equivalent of ten-to-twenty six-packs to reproduce aphyr's work and -given the evidence I have before me- have you reject that, as well.

200 USD is a lot of money to throw away to "win" an Internet argument with a stranger who refuses to accept evidence presented by someone known to be careful, scrupulous, and honest.

	▲	simianwords 2 minutes ago \| parent [-]
		> On top of that, your goalpost-shifting and general demeanor throughout this conversation simply don't give me the impression that you've much integrity. I'm not spending the equivalent of ten-to-twenty six-packs to reproduce aphyr's work and -given the evidence I have before me- have you reject that, as well. Lol what goal post did I move? I said text only and you rejected it. You can present the example here and let the public judge it - even if my integrity is compromised. I'm allowing you to do it. > 200 USD is a lot of money to throw away to "win" an Internet argument with a stranger who refuses to accept evidence presented by someone known to be careful, scrupulous, and honest. 200 what? I'm using the $20 one. This is getting ridiculous! You can't present a _single_ counter example!