Remix.run Logo
simonw 2 days ago

Current frontier LLMs - Claude 4, GPT-5, Gemini 2.5 - are massively more likely to say "I don't know" than last year's models.

cj 2 days ago | parent [-]

I don’t think I’ve ever seen ChatGPT 5 refuse to answer any prompt I’ve ever given it. I’m doing 20+ chats a day.

What’s an example prompt where it will say “idk”?

Edit: Just tried a silly one, asking it to tell me about the 8th continent on earth, which doesn’t exist. How difficult is it for the model to just say “sorry, there are only 7 continents”. I think we should expect more from LLMs and stop blaming things on technical limitations. “It’s hard” is getting to be an old excuse considering the amount of money flowing into building these systems.

simonw 2 days ago | parent [-]

https://chatgpt.com/share/68b85035-62ec-8006-ab20-af5931808b... - "There are only seven recognized continents on Earth: Africa, Antarctica, Asia, Australia, Europe, North America, and South America."

Here's a recent example of it saying "I don't know" - I asked it to figure out why there was an octopus in a mural about mushrooms: https://chatgpt.com/share/68b8507f-cc90-8006-b9d1-c06a227850... - "I wasn’t able to locate a publicly documented explanation of why Jo Brown (Bernoid) chose to include an octopus amid a mushroom-themed mural."

cj 2 days ago | parent [-]

Not sure what your system prompt is, but asking the exact same prompt word for word for me results in a response talking about "Zealandia, a continent that is 93% submerged underwater."

The 2nd example isn't all that impressive since you're asking it to provide you something very specific. It succeeded in not hallucinating. It didn't succeed at saying "I'm not sure" in the face of ambiguity.

I want the LLM to respond more like a librarian: When they know something for sure, they tell you definitively, otherwise they say "I'm not entirely sure, but I can point you to where you need to look to get the information you need."

simonw 2 days ago | parent [-]

I'm using regular GPT-5, no custom instructions and memory turned off.

Can you link to your shared Zealandia result?

I think that mural result was spectacularly impressive, given that it started with a photo I took of the mural with almost no additional context.

cj 2 days ago | parent [-]

I can't link since it's in an enterprise account.

Interestingly I tried the same question in a separate ChatGPT account and it gave a similar response you got. Maybe it was pulling context from the (separate) chat thread where it was talking about Zealandia. Which raises another question: once it gets something wrong once, will it just keep reenforcing the inaccuracy in future chats? That could lead to some very suboptimal behavior.

Getting back on topic, I strongly dislike the argument that this is all "user error". These models are on track to be worth a trillion dollars at some point in the future. Let's raise our expectations of them. Fix the models, not the users.

simonw 2 days ago | parent [-]

I wonder if you're stuck on an older model like GPT-4o?

EDIT: I think that's likely what is happening here: I tried the prompt against GPT-4o and got this https://chatgpt.com/share/68b8683b-09b0-8006-8f66-a316bfebda...

My consistent position on this stuff is that it's actually way harder to use than most people (and the companies marketing it) let on.

I'm not sure if it's getting easier to use over time either. The models are getting "better" but that partly means their error cases are harder to reason about, especially as they become less common.