new | show | ask | jobs Github

ACCount37 5 days ago

Is there any knowledge of "correct vs incorrect" inside you?

If "no", then clearly, you can hit general intelligence without that.

And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

Would it be perfect? Hahahaha no. But I see no reason why "good enough" could not be attained.

▲

wavemode 5 days ago | parent | next [-]

> Is there any knowledge of "correct vs incorrect" inside you?

There is a sort of knowledge humans possess that LLMs don't (and in fact can't, without a fundamental architectural change), which is knowledge of how certain one is about something.

If you ask a human a question about how something works in biology, they will be able to give you an answer as well as a sort of "epistemic" citation (i.e. the difference between "I don't remember where exactly I originally read that, but I'm a research biologist and am quite certain that's how it works" versus "I don't remember where I read that - it's probably just something we learned about in biology class in high school. Take it with a grain of salt, as I could be misremembering.")

LLMs don't have this reflexive sense of their own knowledge - there's a fundamental divide between training data (their "knowledge") and context (their "memory") which causes them to not really be capable of understanding how they know what they know (or, indeed, whether they truly know it at all). If a model could be created where the context and training data were unified, like in a brain, I could see a more realistic path to general intelligence than what we have now.

▲

ACCount37 5 days ago | parent [-]

LLMs have that knowledge. Just not nearly enough of it. Some of it leaks through from the dataset, even in base models. The rest has to be taught on purpose.

You can get an LLM to generate a list of facts that includes hallucinations - and then give that list to another instance of the same LLM, and get it to grade how certain it is of each fact listed. The evaluation wouldn't be perfect, but it'll outperform chance.

You can make that better with the right training. Or much worse, with the wrong training. Getting an LLM to be fully aware of all the limits of its knowledge is likely to be impractical, if not outright impossible, but you can improve this awareness by a lot, and set a conservative baseline for behavior, especially in critical domains.

"Fully aware of all the limits of its knowledge" is unattainable for humans too, so LLMs are in a good company.

▲

wavemode 5 days ago | parent [-]

No, LLMs don't have that knowledge. They can't inspect their own weights and examine the contents. It's a fundamental limitation of the technology.

The sort of training you're talking about is content like, "ChatGPT was trained on research papers in the area of biology. It possesses knowledge of A, B, and C. It does not possess knowledge of X, Y and Z." But this merely creates the same problem in a loop - given a question, how does the LLM -know- that its training data contains information about whether or not its training data contains information about the answer to the question? The reality is that it doesn't know, you just have to assume that it did not hallucinate that.

The problem of being unaware of these things is not theoretical - anyone with deep knowledge of a subject will tell you that as soon as you go beyond the surface level of a topic, LLMs begin to spout nonsense. I'm only a software engineer, but even I regularly face the phenomenon of getting good answers to basic questions about a technology, but then beyond that starting to get completely made-up features and function names.

> "Fully aware of all the limits of its knowledge" is unattainable for humans too

This just isn't true. Humans know whether they know things, and whether they know how they know it, and whether they know how they know how they know it, and...

Knowledge itself can contain errors, but that's not what I'm talking about. I'm not talking about never being wrong. I'm merely talking about having access to the contents of one's own mind. (Humans can also dynamically update specific contents of their own mind, but that's also not even what I'm talking about right now.) An LLMs hallucination is not just knowledge that turned out to be wrong, it is in fact knowledge that never existed to begin with, but the LLM has no way of telling the difference.

▲

ACCount37 5 days ago | parent | next [-]

Humans can't "inspect their own weights and examine the contents" either.

No human has ever managed to read out his connectome without external instrumentation. There were entire human civilizations that thought that the seat of consciousness was the heart - which, for creatures that claim to know how their own minds work, is a baffling error to make.

LLMs are quite similar in that to humans. They, too, have no idea what their hidden size is, or how many weights they have, or how exactly are the extra modalities integrated into them, or whether they're MoE or dense. They're incredibly ignorant of their own neural architecture. And if you press them on it, they'll guess, and they'll often be wrong.

The difference between humans and LLMs comes down to the training data. Humans learn continuously - they remember what they've seen and what they haven't, they try things, they remember the outcomes, and get something of a grasp (and no, it's not anything more than "something of a grasp") of how solid or shaky their capabilities are. LLMs split training and inference in two, and their trial-and-error doesn't extend beyond a context window. So LLMs don't get much of that "awareness of their own capabilities" by default.

So the obvious answer is to train that awareness in. Easier said than done. You need to, essentially, use a training system to evaluate an LLM's knowledge systematically, and then wire the awareness of the discovered limits back into the LLM.

OpenAI has a limited-scope version of this in use for GPT-5 right now.

	▲	wnoise 5 days ago \| parent \| next [-]
		No, humans can't inspect their own weights either -- but we're not LLMs and don't store all knowledge implicitly as probabilities to output next token. It's pretty clear that we also store some knowledge explicitly, and can include context of that knowledge. (To be sure, there are plenty of cases where it is clear that we are only making up stories after the fact about why we said or did something. But sometimes we do actually know and that reconstruction is accurate.)
	▲	blix 5 days ago \| parent \| prev [-]
		I inspect and modify my own weights literally all the time. I just do it on a more abstract level than individual neurons. I call this process "learning"

▲

utyop22 5 days ago | parent | prev [-]

"The problem of being unaware of these things is not theoretical - anyone with deep knowledge of a subject will tell you that as soon as you go beyond the surface level of a topic, LLMs begin to spout nonsense"

I've tested this in a wide range of topics across corporate finance, valuation, economics and so on and yes once you go one or two levels deep it starts spouting total nonsense. If you ask it to define terms succintly and simply it cannot. Why? Because the data that been fed into the model is from people who cannot do it themselves lol.

The experts, will remain experts.

Most people I would argue have surface level knowledge so they are easily impressed and don't get it because A) they don't go deep B) They don't know what it means to go thoroughly deep in a subject area.

▲

thaumasiotes 5 days ago | parent | prev | next [-]

> And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

An LLM, by definition, doesn't have such a concept. It's a model of language, hence "LLM".

Do you think the phrase just means "software"? Why?

▲

ACCount37 5 days ago | parent [-]

If I had a penny for an every confidently incorrect "LLMs can't do X", I'd be able to buy an H100 with that.

Here's a simple test: make up a brand new word, or a brand new person. Then ask a few LLMs what the word means, or when that person was born.

If an LLM had zero operational awareness of its knowledge, it would be unable to recognize that the word/person is unknown to it. It would always generate a plausible-sounding explanation for what the word might mean, the same exact way it does for the word "carrot". Or a plausible-sounding birth date, the way it does for the person "Abraham Lincoln".

In practice, most production grade LLMs would recognize that a word or a person is unknown to them.

This is a very limited and basic version of the desirable "awareness of its own knowledge" - and one that's already present in current LLMs! Clearly, there's room for improved self-awareness.

▲

pessimizer 5 days ago | parent [-]

Do they "recognize" that they don't know the word, or are there just no statistically plausible surroundings that they can embed a nonsense word into other than settings that usually surround un-tokenizable words?

If you told them to write a Lewis Carroll poem about a nonsense word, it wouldn't have any problem. Not because it "recognizes" the word as being like a nonsense word in a Lewis Carroll poem, but because those poems are filled with other un-tokenizable words that could be replaced with anything.

I'm starting to come to the conclusion that LLMs are Mad-Libs at scale. Which are actually very useful. If there are paragraphs where I can swap out the words for other words, and generate a plausible idea, I can try it out in the real world and it might really work.

▲

ACCount37 5 days ago | parent | next [-]

I don't think there's a direct link to the tokenizer - it's a higher level capability. You can stitch together a nonsense word out of common "word fragment" tokens and see if that impairs the LLM's ability to recognize the word as nonsense.

▲

Jensson 5 days ago | parent [-]

That is wrong, I just generated 5 random letters in python and sent it to gpt-5 and it totally failed to answer properly, said "Got it, whats up :)" even though what I wrote isn't recognizable at all.

The "capability" you see is for the LLM to recognize its a human typed random string since human typed random strings are not very random. If you send it an actual random word then it typically fails.

▲

pfg_ 4 days ago | parent [-]

I tried this four times, every time it recognized it as nonsense.

	▲	typpilol 4 days ago \| parent [-]
		Same

▲

thaumasiotes 5 days ago | parent | prev [-]

> If you told them to write a Lewis Carroll poem about a nonsense word, it wouldn't have any problem.

This makes me wonder something specific.

Let's imagine that we generate poetry "in the style of Lewis Carroll" around a particular nonsense word, one that hasn't been written down before.

Will that poetry treat the word as if it has one consistent pronunciation?

(This question doesn't quite apply to Jabberwocky - Lewis Carroll himself would obviously have passed the test, but he doesn't reuse his nonsense words.)

▲

ninetyninenine 5 days ago | parent | prev [-]

I'm going to tell you straight up. I am a very intelligent man and I've been programming for a very long time. My identity is tied up with this concept that I am intelligent and I'm a great programmer so I'm not going to let some AI do my job for me. Anything that I can grasp to criticize the LLM I'm gonna do it because this is paramount to me maintaining my identity. So you and your rationality aren't going to make me budge. LLMs are stochastic parrots and EVERYONE on this thread agrees with me. They will never take over my job!

I will add they will never take over my job <in my lifetime> because it makes me sound more rational and it's easier to swallow that then to swallow the possibility that they will make me irrelevant once the hallucination problem is solved.

	▲	simianwords 4 days ago \| parent [-]
		Ha. I could have written this post myself.