new | show | ask | jobs Github

simianwords 2 hours ago

> Use the thinking version gpt5.4 (text) and tell me if it bullshits

This was what I said. Text! Despite me specifically asking for text, you've shown a voice example. Not sure why?

I believe you and I agree that GPT 5.4 thinking on text that fits < 4 pages never bullshits? Then we are good!

If we agree on this, I think the post doesn't capture this in spirit.

▲

simoncion 2 hours ago | parent [-]

> This was what I said. Text!

No, that's what you said after I provided an example of paid ChatGPT emitting complete bullshit from a two sentence prompt.

The challenge you issued is at [0].

[0] <https://news.ycombinator.com/item?id=47692592>

▲

simianwords 2 hours ago | parent [-]

> If it bullshits so much, you wouldn't have a problem giving me an example of it bullshitting on ChatGPT (paid version)? Lets take any example of a text prompt fitting a few pages - it may be a question in science or math or any domain. Can you get it to bullshit?

I have clearly written text prompt here. And I repeated a few times. It’s not my fault you didn’t read it. You are coming across as a bit of a bad faith arguer.

In any case, you agree that under these constraints bullshitting doesn’t exist?

▲

simoncion 2 hours ago | parent [-]

> I have clearly written text prompt here.

How do you think the "voice" interface works? It runs speech-to-text on the input and turns the input into text. The LLMs don't decode voice, they work on text.

You can see this process in action on many of father_phi's videos.

Regardless, I expect that aphyr's reported results are on the very latest publicly-available ChatGPT models.

▲

simianwords 2 hours ago | parent [-]

Very bad faith arguments. I clearly said text and you disregarded it multiple times and you are still arguing.

You've still not given me a single example of it bullshitting 5.4 thinking in text. It shows a lot that you have ignored this multiple times. Unfortunate!

▲

simoncion 2 hours ago | parent [-]

I'm not sure why you're ignoring aphyr's reports. I'm also unsure why you're ignoring my original statement that having the text of the conversation that lead ChatGPT to bullshit is entirely irrelevant, as being unable to repro the report is even worse for ChatGPT than being able to repro would be.

shrug

▲

simianwords 2 hours ago | parent [-]

I specified text just to ignore the voice one because it uses 4o-mini underneath. And its kinda stupid to keep ignoring that and saving face now - reconsider this approach.

I believe this is the 5th time I'm asking this: you are not able to produce a _single_ counter example for my challenge? After all this surely I can get a direct acknowledgement here.

▲

simoncion an hour ago | parent [-]

> you are not able to produce a _single_ counter example for my challenge?

I have. For both your original challenge and your updated one.

Consider:

1) AFAICT, there's no way to tell what version of the model was used to produce the output in a ChatGPT share link.

2) You don't appear to believe my assertions that aphyr is almost certainly paying for and using the latest version of the LLMs available, and that he's faithfully reporting his interactions with the LLMs.

3) Because of #2, I expect that you won't believe me if I report that I've more-or-less reproduced father_phi's results about the cup that's sealed on the top and open on the bottom on the very latest only-available-for-pay ChatGPT model.

3a) You might attempt to check my report, but I'd be shocked if you'd consider a failure to reproduce my results to be a significant strike against ChatGPT. I'd think it's more likely that you'd either call me a liar, or tell me that I must have had some setting wrong somewhere.

3b) Even if you told me to share the ChatGPT chat that proved my assertion, #1 -combined with your demeanor throughout this conversation- tells me that you'd almost certainly claim that I was using an inferior version of the model and was lying to you.

▲

simianwords an hour ago | parent [-]

Haha ok. So still no example?

The GPT shared link shows a "thought for" which indicates using the latest thinking model. You may try that.

What you can do is this: submit a prompt that clearly makes GPT hallucinate.

You may secretly use a worse model. You may use a system prompt that deliberately gives wrong answers. But I'm going to assume you won't go that far.

We can leave it to the public to decide whether this is a legitimate counter example or not and whether it can really be reproduced. Shall we try that? I'm guessing you won't but worth a shot!

▲

simoncion 16 minutes ago | parent [-]

You weren't paying much attention to the "Consider:" part of my previous comment.

You don't believe that a well-paid, very careful, high-integrity member of the computer safety community has -on multiple occasions- encountered actual, sustained bullshiting from the latest-available for-pay version of ChatGPT. You don't accept either this fellow's reports or my informed assessment of his computing situation as truthful and accurate. On top of that, your goalpost-shifting and general demeanor throughout this conversation simply don't give me the impression that you've much integrity. I'm not spending the equivalent of ten-to-twenty six-packs to reproduce aphyr's work and -given the evidence I have before me- have you reject that, as well.

200 USD is a lot of money to throw away to "win" an Internet argument with a stranger who refuses to accept evidence presented by someone known to be careful, scrupulous, and honest.

	▲	simianwords 4 minutes ago \| parent [-]
		> On top of that, your goalpost-shifting and general demeanor throughout this conversation simply don't give me the impression that you've much integrity. I'm not spending the equivalent of ten-to-twenty six-packs to reproduce aphyr's work and -given the evidence I have before me- have you reject that, as well. Lol what goal post did I move? I said text only and you rejected it. You can present the example here and let the public judge it - even if my integrity is compromised. I'm allowing you to do it. > 200 USD is a lot of money to throw away to "win" an Internet argument with a stranger who refuses to accept evidence presented by someone known to be careful, scrupulous, and honest. 200 what? I'm using the $20 one. This is getting ridiculous! You can't present a _single_ counter example!