Let's be honest: many users of LLMs have no interest in uncertainty. They don't want to hear "I don't know" and if given that response would quickly switch to an alternative service that gives them a definitive answer. The users would rather have a quick answer than a correct answer. People who are more circumspect, and value truth over speed, would and should avoid LLMs in favor of "old-fashioned methods" of discovering facts.

LLMs are the fast food of search. The business model of LLMs incentivizes hallucinations.

▲

ACCount37 5 days ago | parent [-]

I don't think that's actually true.

Sure, it might be true that most users use LLMs as a more flexible version of Google/Wikipedia, and would prefer a confident-but-wrong response to "I don't know".

But most users that use an LLM in this mode also wouldn't ask really complex, very out-of-distribution, hard-to-know hallucination-inducing questions.

And people who would ask an LLM really complex, very out-of-distribution hard-to-know questions are more likely to appreciate an LLM that would recognize the limits of its own knowledge, and would perform research on a topic when appropriate.

▲

lapcat 5 days ago | parent [-]

> But most users that use an LLM in this mode also wouldn't ask really complex, very out-of-distribution, hard-to-know hallucination-inducing questions.

You appear to be assuming, incorrectly, that LLMs hallucinate only "really complex, very out-of-distribution, hard-to-know" questions. From the paper: "How many Ds are in DEEPSEEK? If you know, just say the number with no commentary. DeepSeek-V3 returned “2” or “3” in ten independent trials; Meta AI and Claude 3.7 Sonnet2 performed similarly, including answers as large as “6” and “7”." https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4a...

It's a human characteristic to get "easy" questions right and "hard" questions wrong. But LLMs are not human and don't behave like humans.

▲

ACCount37 5 days ago | parent [-]

That's a really complex, very out-of-distibution, hard-to-know question for the early LLMs. Not that it's too hard to fix that, mind.

Those LLMs weren't very aware of tokenizer limitations - let alone aware enough to recognize them or work around them in the wild.

▲

lapcat 5 days ago | parent [-]

> That's a really complex, very out-of-distibution, hard-to-know question

No, it's not. It's a trivial question in any context.

> for the early LLMs.

Early? Claude 3.7 was introduced just 6 months ago, and Deepseek-V3 9 months ago. How is that "early"?

▲

ACCount37 5 days ago | parent [-]

Do I really have to explain what the fuck a "tokenizer" is, and why does this question hit the tokenizer limitations? And thus requires extra metacognitive skills for an LLM to be able to answer it correctly?

	▲	lapcat 5 days ago \| parent \| next [-]
		> Do I really have to explain what the fuck Please respect the HN guidelines: https://news.ycombinator.com/newsguidelines.html What you need to explain is your claim that the cited LLMs are "early". According to the footnotes, the paper has been in the works since at least May 2025. Thus, those LLMs may have been the latest at the time, which was not that long ago. In any case, given your guidelines violations, I won't be continuing in this thread.
	▲	Jensson 5 days ago \| parent \| prev [-]
		The only "metacognitive" skill it needs is to know how many D there are in every token, and sum those up. Humans are great at that sort of skill, which is why they can answer that sort of question even in languages where each letter is a group of sounds and not just one like Japanese katakana, that is not hard at all. LLM are also really great at this skill when there is ample data for it. There is not a lot of data for "how many D in DEEPSEEK", so they fail that.