▲ | nunez 2 days ago | |
From the abstract of the paper [^0]: > Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty This is a de facto false equivalence for two reasons. First, test takers that are faced with hard questions have the capability of _simply not guessing at all._ UNC did a study on this [^1] by administering a light version of the AMA medical exam to 14 staff members that were NOT trained in the life sciences. While most of the them consistently guessed answers, roughly 6% of them did not. Unfortunately, the study did not disambiguate correct guesses versus questions that were left blank. OpenAI's paper proves that LLMs, at this time of writing, simply do not have the self-awareness of knowing whether they _really_ don't know something, by design. Second, LLMs are not test takers in the pragmatic sense. They are query answerers. Bar argument settlers. Virtual assistants. Best friends on demand. Personal doctors on standby. That's how they are marketed and designed, at least. OpenAI wants people to use ChatGPT like a private search engine. The sources it provides when it decides to use RAG are there more for instilling confidence in the answer instead of encouraging their users to check its work. A "might be inaccurate" disclaimer on the bottom is about as effective as the Surgeon General's warning on alcohol and cigs. The stakes are so much higher with LLMs. Totally different from an exam environment. A final remark: I remember professors hammering "engineering error" margins into us when I was a freshman in 2005. 5% was what was acceptable. That we as a society are now okay with using a technology that has a >20% chance of giving users partially or completely wrong answers to automate as many human jobs as possible blows my mind. Maybe I just don't get it. | ||
▲ | amenhotep 2 days ago | parent [-] | |
> The sources it provides when it decides to use RAG are there more for instilling confidence in the answer instead of encouraging their users to check its work. This is an insightful point. I had an exchange with someone on the internet a few days ago; he wanted to argue something and used ChatGPT to "find sources" ("I'm at work and it's quicker"). It did its search, found a page with a figure, dutifully gave him that figure and a link, and he posted it. The figure was, if you thought about it for a second, verging on ludicrous, and when prompted about it he linked the original source - it was a blog post that was evidently slop and the figure was just something that the original AI tasked to write that post had made up! The "research" was just a non-human centipede but he was perfectly happy to trust it and post it because well ChatGPT has gone and found a source so it must be true. Thought terminating. |