I have an issue with the words "understanding", "reasoning", etc when talking about LLMs.

Are they really understanding, or putting out a stream of probabilities?

▲ munchler 3 days ago | parent | next [-]

Does it matter from a practical point of view? It's either true understanding or it's something else that's similar enough to share the same name.

▲

axdsk 3 days ago | parent [-]

The polygraph is a good example.

The "lie detector" is used to misguide people, the polygraph is used to measure autonomic arousal.

I think these misnomers can cause real issues like thinking the LLM is "reasoning".

	▲	dexterlagan 3 days ago \| parent [-]
		Agreed, but in the case of the lie detector, it seems it's a matter of interpretation. In the case of LLMs, what is it? Is it a matter of saying "It's a next-word calculator that uses stats, matrices and vectors to predict output" instead of "Reasoning simulation made using a neural network"? Is there a better name? I'd say it's "A static neural network that outputs a stream of words after having consumed textual input, and that can be used to simulate, with a high level of accuracy, the internal monologue of a person who would be thinking about and reasoning on the input". Whatever it is, it's not reasoning, but it's not a parrot either.

▲ sema4hacker 3 days ago | parent | prev | next [-]

The latter. When "understand", "reason", "think", "feel", "believe", and any of a long list of similar words are in any title, it immediately makes me think the author already drank the kool aid.

▲ manveerc 3 days ago | parent | next [-]

In the context of coding agents, they do simulate “reasoning” when you feed them the output and it is able to correct itself.

▲ qwertytyyuu 3 days ago | parent | prev | next [-]

I agree with “feel” and “believe” but what words would you suggest instead of “understand” and “reason’?

▲

sema4hacker 3 days ago | parent [-]

None. Don't anthropomorphize at all. Note that "understanding" has now been removed from the HN title but not the linked pdf.

▲

platypii 3 days ago | parent [-]

Why not? We are trying to evaluate AI's capabilities. It's OBVIOUS that we should compare it to our only prior example of intelligence -- humans. Saying we shouldn't compare or anthropomorphize machine is a ridiculous hill to die on.

	▲	sema4hacker 2 days ago \| parent [-]
		If you are comparing the performance of a computer program with the performance of a human, then using terms implying they both "understand" wrongly implies they work in the same human-like way, and that ends up misleading lots of people, especially those who have no idea (understanding!) how these models work. Great for marketing, though.

▲ vexna 3 days ago | parent | prev [-]

kool aid or not -- "reasoning" is already part of the LLM verbiage (e.g `reasoning` models having `reasoningBudget`). The meaning might not be 1:1 to human reasoning, but when the LLM shows its "reasoning" it does look _appear_ like a train of thought. If I had to give what it's doing a name (like I'm naming a function), I'd be hard pressed to not go with something like `reason`/`think`.

	▲	insin 3 days ago \| parent [-]
		`prefillContext()`

▲ hodgehog11 3 days ago | parent | prev | next [-]

What does understanding mean? Is there a sensible model for it? If not, we can only judge in the same way that we judge humans: by conducting examinations and determining whether the correct conclusions were reached.

Probabilities have nothing to do with it; by any appropriate definition, there exist statistical models that exhibit "understanding" and "reasoning".

	▲	Workaccount2 3 days ago \| parent [-]
		https://ai.vixra.org/pdf/2506.0065v1.pdf Lays out pretty well what our current knowledge on understanding is

▲ jmpeax 3 days ago | parent | prev | next [-]

Do you yourself really understand, or are you just depolarizing neurons that have reached their threshold?

▲

octomind 3 days ago | parent | next [-]

It can be simultaneously true that human understanding is just a firing of neurons but that the architecture and function of those neural structures is vastly different than what an LLM is doing internally such that they are not really the same. Encourage you to read Apple’s recent paper on thinking models; I think it’s pretty clear that the way LLMs encode the world is drastically inferior to what the human brain does. I also believe that could be fixed with the right technical improvements, but it just isn’t the case today.

▲

dmead 3 days ago | parent | prev | next [-]

He doesn't know the answer to that and neither do you.

▲

lucisferre 3 days ago | parent | prev [-]

[flagged]

	▲	3 days ago \| parent [-]

▲ dang 3 days ago | parent | prev [-]

OK, we've removed all understanding from the title above.

▲

fragmede 3 days ago | parent [-]

Care to provide reasoning as to why?

▲

dang 3 days ago | parent [-]

The article's title was longer than 80 chars, which is HN's limit. There's more than one way to truncate it.

The previous truncation ("From GPT-4 to GPT-5: Measuring Progress in Medical Language Understanding") was baity in the sense that the word 'understanding' was provoking objections and taking us down a generic tangent about whether LLMs really understand anything or not. Since that wasn't about the specific work (and since generic tangents are basically always less interesting*), it was a good idea to find an alternate truncation.

So I took out the bit that was snagging people ("understanding") and instead swapped in "MedHELM". Whatever that is, it's clearly something in the medical domain and has no sharp edge of offtopicness. Seemed fine, and it stopped the generic tangent from spreading further.

* https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

	▲	fragmede 3 days ago \| parent [-]
		Well thought out, thank you! Generic Tangents is my new band's name.