This is not falsifiable, I don't buy it. Do one where we all know is false please?

"Hey ChatGPT. I've recently grown horns and I need some care advice. Should I polish my horns before going to have them trimmed or will the horn trimmer polish them for me?"

https://chatgpt.com/share/69d69b18-d1c8-83e8-bc47-8f315a1b55...

▲

simianwords 2 hours ago | parent [-]

I wanted this challenge with the thinking version (I apologised for it and edited the earlier version).

It doesn't bullshit on the GPT-5.4 thinking version.

Here is the result with thinking https://chatgpt.com/share/69d69dd6-fb50-838d-863c-4e1eda5d08...

I suggest you try it yourself to be convinced. Try it in incognito mode if you wish. Or not.

▲

camgunz 2 hours ago | parent [-]

"Hey ChatGPT. I'm building a Final Fantasy 6 mod, and I need more space for the battle scripts. How would I rearrange the data in the ROM to give me the extra space I need?"

https://chatgpt.com/share/69d6a16c-6014-83e8-a79d-d5d11ed2eb...

That is not where the battle scripts are.

---

Anyway, it's trivial to get pretty much any model to make things up. Don't we all know this? That's why I was surprised by your position; if we know anything about these things it's that they make things up.

▲

simianwords 2 hours ago | parent [-]

https://chatgpt.com/share/69d6a38c-bd54-838c-82e3-609d9e66c9...

I used the thinking version (like I asked before). I think this is right. If not, please tell.

Also; you didn’t falsify anything. Nor the first. Nor the second.

If the second one is bullshit, I accept I’m wrong - I have no idea how to verify though so I’ll leave it up to you.

I think yours is the classic case of “use the free version to judge the paid one”.

▲

camgunz an hour ago | parent | next [-]

The thinking version is mostly right, but:

- it searches the internet to find the answer, it doesn't "reason". I'm not claiming Google is a bullshit machine, and it's not surprising the answer is discoverable (it has to be, for the conditions of our experiment).

- near the end it says "If you are building from the FF6 disassembly instead of hand-editing the ROM, the repo is already organized into separate modules and linker configs, so the clean approach is to relocate the script data in the source and let the build place it in a different ROM region." But I didn't reference a repo or git: it hallucinated that stuff from one of its sources.

I'm not saying this stuff doesn't have its place, but they definitely make things up and we can't stop them.

▲

simianwords an hour ago | parent [-]

Wait I can't find the quote you are speaking about. Are you looking at something else?

In any case - it should be clear that it did not bullshit and it got it right. So far you have not come up with anything that tells me it bullshits. I'm happy for you to give me more prompts to verify because I think you haven't used the thinking version yet and you base your criticism on the free version.

▲

camgunz an hour ago | parent [-]

Sorry: https://chatgpt.com/share/69d6ac63-d200-8330-8c47-95a75db8bb...

Also what? The repo bit is clear bullshit.

▲

simianwords an hour ago | parent [-]

it linked it: https://github.com/everything8215/ff6 (check the end)

▲

camgunz an hour ago | parent [-]

I saw; I replied up there

	▲	simianwords an hour ago \| parent [-]
		I don't think this is an example of bullshit. It referenced a repo - the canonical repo for this project. I could not find any other repo that has the disassembly. It didn't hallucinate anything. I think you are trying really hard here but lets be clear here: there's no bullshitting and I'll leave it to the public to decide.

▲

camgunz an hour ago | parent | prev [-]

I could quibble with some things, but this is right. I don't have a paid account so I can't ping away at 5.4 or whatever, but, I do have access to frontier models at work, and they hallucinate regularly. Dunno what to do if you don't believe this; good luck I guess.

	▲	simianwords 44 minutes ago \| parent [-]
		I agree that they hallucinate sometimes. I agree they bullshit sometimes. But the extent is way overblown. They basically don't bullshit ever under the constraints of 1. 2-3 pages of text context 2. GPT-5.4 thinking I don't think the spirit of the original article (not your comments to be fair) captured this, hence the challenge. I believe we are on the same page here.