Remix.run Logo
rhubarbtree 5 days ago

I find this rather oddly phrased.

LLMs hallucinate because they are language models. They are stochastic models of language. They model language, not truth.

If the “truthy” responses are common in their training set for a given prompt, you might be more likely to get something useful as output. Feels like we fell into that idea and said - ok this is useful as an information retrieval tool. And now we use RL to reinforce that useful behaviour. But still, it’s a (biased) language model.

I don’t think that’s how humans work. There’s more to it. We need a model of language, but it’s not sufficient to explain our mental mechanisms. We have other ways of thinking than generating language fragments.

Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

crystal_revenge 5 days ago | parent | next [-]

People also tend not to understand the absurdity of assuming that we can make LLMs stop hallucinating. It would imply not only that truth is absolutely objective, but that it exists on some smooth manifold which language can be mapped to.

That means there would be some high dimensional surface representing "all true things". Any fact could be trivially resolved as "true" or "false" simply by exploring whether or not it was represented on this surface. Where or not "My social security number is 123-45-6789" is true could be determined simply by checking whether or not that statement was mappable to the truth manifold. Likewise you could wander around that truth manifold and start generating output of all true things.

If such a thing existed it would make even the wildest fantasies about AGI seem tame.

edit: To simplify it further, this would imply you could have an 'is_true(statement: string): bool' function for any arbitrary statement in English.

jdietrich 5 days ago | parent | next [-]

>People also tend not to understand the absurdity of assuming that we can make LLMs stop hallucinating. It would imply not only that truth is absolutely objective, but that it exists on some smooth manifold which language can be mapped to.

Frankly, this is a silly line of argument. There is a vast spectrum between regularly inventing non-existent citations and total omniscience. "We can't define objective truth" isn't a gotcha, it's just irrelevant.

Nobody in the field is talking about or working on completely eliminating hallucinations in some grand philosophical sense, they're just grinding away at making the error rate go down, because that makes models more useful. As shown in this article, relatively simple changes can have a huge effect and meaningful progress is being made very rapidly.

We've been here before, with scepticism about Wikipedia. A generation of teachers taught their students "you can't trust Wikipedia, because anyone can edit it". Two decades and a raft of studies later, it became clear that Wikipedia is at least as factually accurate as traditional encyclopedias and textbooks. The contemporary debate about the reliability of Wikipedia is now fundamentally the same as arguments about the reliability of any carefully-edited resource, revolving around subtle and insidious biases rather than blatant falsehoods.

Large neural networks do not have to be omniscient to be demonstrably more reliable than all other sources of knowledge, they just need to keep improving at their current rate for a few more years. Theoretical nitpicking is missing the forest for the trees - what we can empirically observe about the progress in AI development should have us bracing ourselves for radical social and economic transformation.

apsurd 4 days ago | parent | next [-]

You're not being charitable with the take. seems like you just switched "objective truth" with your flavor: "error rate"

what is an error? how does the llm "know"?

wikipedia example is good. i'd say its "truth" is based on human curated consensus. everyone gets that. what i don't get what's the llm analog? as you state, it's just about making the error rate go down, ok so what is an error? does it require human in the loop?

skydhash 4 days ago | parent | prev | next [-]

The thing is, for a lot of tasks, a formal method (either algorithmic or simulation) can be very efficient to create and run with more reliable results. And for a lot of cases, creating a simpler and smaller model with other ML techniques can be as good or better than LLMs.

There's still no justification for the whole investment craze in LLMs.

player1234 3 days ago | parent | prev [-]

[flagged]

mqus 5 days ago | parent | prev | next [-]

Well, no. The article pretty much says that any arbitrary statement can be mapped to {true, false, I don't know}. This is still not 100% accurate, but at least something that seems reachable. The model should just be able to tell unknowns, not be able to verify every single fact.

gary_0 5 days ago | parent [-]

Determining a statement's truth (or if it's outside the system's knowledge) is an old problem in machine intelligence, with whole subfields like knowledge graphs and such, and it's NOT a problem LLMs were originally meant to address at all.

LLMs are text generators that are very good at writing a book report based on a prompt and the patterns learned from the training corpus, but it's an entirely separate problem to go through that book report statement by statement and determine if each one is true/false/unknown. And that problem is one that the AI field has already spent 60 years on, so there's a lot of hubris in assuming you can just solve that and bolt it onto the side of GPT-5 by next quarter.

red75prime 4 days ago | parent [-]

> And that problem is one that the AI field has already spent 60 years on

I hope you don't think that the solutions will be a closed-form expression. The solution should involve exploration and learning. The things that LLMs are instrumental in, you know.

sirwhinesalot 4 days ago | parent | next [-]

Not the same person but I think the "structure" of what the ML model is learning can have a substantial impact, specially if it then builds on that to produce further output.

Learning to guess the next token is very different from learning to map text to a hypervector representing a graph of concepts. This can be witnessed in image classification tasks involving overlapping objects where the output must describe their relative positioning. Vector-symbolic models perform substantially better than more "brute-force" neural nets of equivalent size.

But this is still different from hardcoding a knowledge graph or using closed-form expressions.

Human intelligence relies on very similar neural structures to those we use for movement. Reference frames are both how we navigate the world and also how we think. There's no reason to limit ourselves to next token prediction. It works great because it's easy to set up with the training data we have, but it's otherwise a very "dumb" way to go about it.

red75prime 2 days ago | parent [-]

I mostly agree. But, next token prediction is a pretraining phase of an LLM, not all there is to LLMs.

gary_0 4 days ago | parent | prev [-]

Of course not, expert systems were abandoned decades ago for good reason. But LLMs are only one kind of ANN. Unfortunately, when all you have is a hammer...

thisoneisreal 5 days ago | parent | prev | next [-]

A great book in this vein is "Language vs. Reality." The main thesis of the book is that language evolved to support approximate, ad hoc collaboration, and is woefully inadequate for doing the kind of work that e.g. scientists do, which requires incredible specificity and precision (hence the amount of effort devoted to definitions and quantification).

BobbyTables2 5 days ago | parent | prev | next [-]

Agree. I deeply suspect the problem of asking an LLM to not hallucinate is equivalent to the classic Halting Problem.

beeflet 4 days ago | parent | prev [-]

Maybe if a language model was so absolutely massive, it could <think> enough to simulate the entire universe and determine your social security number

riwsky 4 days ago | parent [-]

42

thisoneisreal 5 days ago | parent | prev | next [-]

This strikes me as a perfect description of the core problem. Whenever I think about this, what sticks out to me is that other animals do all sorts of things that look like "intelligence," or at least cognition, and they do it totally without language. My cat clearly recognizes objects, assigns them different values ("scary," "tasty," "fun to play with"), interacts with them in some kind of loop, even predicts their behavior to some extent and acts curious about them (it was really fun to watch her try to figure out the construction guys when I had some work done on my house over a period of a few days). These strike me as much more foundational aspects of intelligence than language. Language has of course immeasurably contributed to what makes human cognition and intelligence, but it's almost certainly built on these pre-linguistic foundations. Another very good hint in this direction is all of the non-verbal thinking that humans have done. Einstein has a famous quote about thinking visually and physically, without using language at all. All of these are powerful suggestions that something else is going on, and most likely some aspect of these things are necessary for true intelligence.

simianparrot 4 days ago | parent [-]

I’ve always thought everyone agreed language was a lossy but useful method of compression for sharing inner concepts and ideas. That my conscious thoughts are “in a language” doesn’t mean my reasoning and entire being interacts with the world using language.

I’m only “thinking in language” when I’m practicing compressing my intent into a shareable format. I don’t think about the majority of highly complex interactions I have with the physical world throughout the day.

As a child did you need to be able to explain in language how the physics of a swing works to be able to use it? Did other kids have to explain it to you in detailed language for you to pick up on how to move your body to do complex tasks?

No. In fact exactly because our compression and decompression of language is even more limited as children, we rely more heavily on raw observation and mimicry of actions occurring in reality itself.

The very idea that a language model can recreate everything we do from the lossy and compressed languages we use to share limited descriptions of much more complex intentions and actions is fundamentally flawed and oversimplified.

utyop22 5 days ago | parent | prev | next [-]

The reality is, language itself does not capture the entirety of what is really going on. And I'd get argue its the poorest way of expressing - but one that enables transmission through various mediums efficiently on a cost basis.

E.g. when I explain a concept, what comes to my mind is not a string of letters and words. There is a mix of imagery and even sounds that I may have acquired from learning about a concept - then I translate that into text so it can be communicated.

Theres a reason why people use native subtitles when watching netflix - text complements imagery and sounds.

kelnos 4 days ago | parent | next [-]

I use subtitles becomes sometimes I have trouble understanding the actors. I believe I read something that suggested that the sound mix in movies and cinematic TV shows has changed a lot in the past couple decades, and a result is that it's harder to understand dialogue.

I don't like this; I find my eyes spending more time than I'd like on the text, and not enough on the visual imagery on the rest of the screen. If I truly wanted more text, I'd just read a book.

pawelmurias 5 days ago | parent | prev [-]

I would assume most people use native subtitles when it's hard to understand what words the actors said.

ekianjo 5 days ago | parent | next [-]

Yeah because modern filmmakers make it very hard to hear dialogs for some reason and actors are encouraged to mumble. If I remember correctly even Nolan admitted it.

jibal 5 days ago | parent [-]

And they often speak very quickly--I often rewind to catch critical plot points. It's a lot different from a stage play, where actors enunciate so clearly. (Not that I want stage cadence and booming voices from a film ... they are different art forms.)

Also I watch of English language material that uses accents quite different from what my ears are tuned to.

jibal 5 days ago | parent | prev | next [-]

That's why I do.

TacticalCoder 5 days ago | parent | prev | next [-]

[dead]

utyop22 5 days ago | parent | prev [-]

No that is not the reason.

People watch Netflix to switch their brain off - having the text there helps along with the visual and sound to deliver the content. However, text is inferior to both visual and sound as a delivery mechanism.

keanebean86 5 days ago | parent [-]

Subtitles increase the signal to noise ratio. At least in our house. We have to keep the tv low to not wake the child. A volume of 10 with subtitles is similar to volume at 16 without subtitles.

crabmusket 5 days ago | parent | prev | next [-]

> I don’t think that’s how humans work.

Every time this comes up I have to bring up Deutsch. He has the best description of intelligent cognition that I've come across. He takes Popper's "conjecture and criticism" approach to science and argues that this guess-and-check loop applies to all our thinking.

E.g. understanding spoken language has some elements of guessing what might have been said and checking that against the sounds we heard. Visual processing has similar analogies.

LLMs seem to be great at conjecturing stuff, but seem incapable of checking or even knowing they need to check.

codethief 4 days ago | parent [-]

> Every time this comes up I have to bring up Deutsch. He has the best description of intelligent cognition that I've come across.

Would you have a reference?

crabmusket 4 days ago | parent [-]

If you like books, read The Beginning of Infinity. If you don't, I can't help! I wish there were something I could point to online, but nothing really encapsulates the lessons I took from that book. Yes, I'll have to write that thing one day.

codethief 4 days ago | parent [-]

Thanks so much!

munchler 5 days ago | parent | prev | next [-]

This is directly addressed in the article, which states that language models can be trained to abstain when uncertain, by changing how rewards are set up. Incentives currently encourage guessing rather than being honest about uncertainty. If you disagree, it would be helpful to explain why, rather than just responding to the title alone.

asats 5 days ago | parent | prev | next [-]

Exactly. I always found it strange when people assume that "hallucinations" are just some sort of a bug in the system, as if by you tweaking some code or training modality will produce an oracle of absolute truth incapable of making mistakes.

ComplexSystems 5 days ago | parent | prev | next [-]

> Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

Why? It seems no less odd than eliminating cases where it gives "undesirable" code snippets with hallucinated errors. This is very important and not odd at all.

rhubarbtree 5 days ago | parent [-]

To clarify, because you will be left with a biased language model. It will continue to hallucinate, and as you squeeze some hallucinations in one part of the language space you may well create new ones elsewhere. It doesn’t seem a solid line of attack

didibus 5 days ago | parent | prev | next [-]

I agree with everything you said except:

> Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

Take it back to what it is like you say, this is a predictive model, and the work of any ML scientist is to iterate on the model to try and get perfect accuracy on unseen data. It makes sense to want to tune the models to lower the rate of predictive errors. And because perfect predictive accuracy is rarely possible, you need to make judgment calls between precision and recall, which, in the case of LLMs, directly affects how often the model will hallucinate versus how often it will stay silent or overly cautious.

rubatuga 5 days ago | parent [-]

But we're getting into the limits of knowledge and what is true/untrue. A stochastic model will be wrong sometimes.

didibus 5 days ago | parent [-]

Off course, 100% prediction accuracy cannot be achieved.

I just mean that, if you're an ML scientist team, you don't just go, we got 76% accuracy, let's close shop, mail in your resignation, job over.

From that angle, it's not odd at all that the team just continues working and now see if they can achieve greater than 76%.

humanfromearth9 4 days ago | parent | prev [-]

Humans think with inductive and deductive reasoning. First inductive, then we generalize and deduce, which allows for quick decision-making, hence increases our survival fitness. I don't know how the transition is done from inductive to deductive, and that's probably why currently, AI is not able to reason like humans.