You are asking why someone don't want to ship a tool that obviously doesn't work? Surely it's always better/more profitable to ship a tool that at least seems to work

▲

fn-mote 3 days ago | parent | next [-]

GP means they aren't good at knowing when they are wrong and should spend more compute on the problem.

I would say the current generation of LLMs that "think harder" when you tell them their first response is wrong is a training grounds for knowing to think harder without being told, but I don't know the obstacles.

	▲	throwaway290 3 days ago \| parent [-]
		Are you suggesting that when you tell it "think harder" it does something like "pass a question to a bigger system"? I have doubts... It would be gated behind more expensive plan if so

▲

jmye 3 days ago | parent | prev [-]

No? I’m interested in why LLMs are bad at knowing when they don’t know the answer, and why that’s a particularly difficult problem to solve.

▲

xmcqdpt2 3 days ago | parent | next [-]

In part because model performance is benchmarked using tests that favor giving partly correct answers as opposed to refusing to answer. If you make a model that doesn't go for part marks, your model will do poorly on all the benchmarks and no one will be interested in it.

https://arxiv.org/abs/2509.04664

▲

throwaway290 3 days ago | parent | prev [-]

Because people make them and people make them for profit. incentives make the product what it is.

an LLM just needs to return something that is good enough for average person confidently to make money. if an LLM said "I don't know" more often it would make less money. because for the user this is means the thing they pay for failed at its job.

▲

jmye a day ago | parent [-]

> and why that’s a particularly difficult problem to solve

The person I responded to, who seems like someone who definitely knows his stuff, made a comment that implied it was a technically difficult thing to do, not a trivially easy thing that's completely explained by "welp, $$$", which is why I asked. Your comments may point to why ChatGPT doesn't do it, but they're not really answering the actual question, in context.

Especially where the original idea (not mine) was a lightweight LLM that can answer basic things, but knows when it doesn't know the answer and can go ask a heftier model for back-up.

	▲	throwaway290 a day ago \| parent [-]
		I think that person should think that technically difficult thing that makes more money = gets solved and technically difficult thing that makes less money = doesn't get solved. By the way model's don't "know". They autocomplete tokens. > Your comments may point to why ChatGPT doesn't do it Any commercial model which is most of them...