> it cannot "logically reason" like a human does

Reason? Maybe. But there's one limitation that we currently have no idea how to overcome; LLMs don't know how much they know. If they tell you they don't something it may be a lie. If they tell you they do, this may be a lie too. I, a human, certainly know what I know and what I don't and can recall from where I know the information

▲

vidarh 6 days ago | parent | next [-]

I have never met a human who has a good grasp of what they know and don't know. They may have a better graps of it than an LLM, but humans are awfully bad at understanding the limits of our own knowledge, and will argue very strongly in favour of knowing more than we demonstrably do in all kinds of contexts.

▲

bwfan123 6 days ago | parent | next [-]

> I have never met a human who has a good grasp of what they know and don't know

yep. There are 2 processes underlying our behaviors.

1) a meme-copier which takes in information from various sources and regurgitates it. A rote-memory machine of sorts. Here, memory is populated with info without deconstructing it into a theory. This has been termed "know-that". Here, explanations are shallow, and repeated questioning of why will fail with: I was told so.

2) a builder which attempts to construct a theory of something. Here, a mechanism is built and understood, and information is deconstructed in the context of the mechanism. This has been termed "know-how". Here, explanations are deeper, and repeated questioning of why will end up with: these are the "givens" in our theory.

Problem is that we operate in the "know-that" territory most of the time, and have not taken the effort to build theories for ourselves in the "know-how" territory.

▲

ModernMech 6 days ago | parent | prev | next [-]

LLMs are not humans, they are ostensibly tools. Tools are supposed to come with a list of things they can do. LLMs don’t and are therefore bad at being tools, so we anthropomorphize them. But they are also not good at being people, so LLMs are left in this weird in-between state where half the people say they’re useful and half the people say they cause more problems than they solve.

▲

vidarh 6 days ago | parent [-]

They are fantastic tools to me. The provide abilities that no other tools we have can provide.

That they also have flaws does not remove those benefits.

▲

ModernMech 6 days ago | parent [-]

There's no question they have capabilities that no other tool has. But a good tool goes beyond just doing something, there are some generally agreed upon principles of tool design that make something good versus just useful.

For example, I think a hammer is a good tool because every time I swing it at a nail, it provides the force necessary to drive it into the wood. It's reliable. Sure, sometimes a hammer breaks, but my baseline expectation in using one is that every time I swing the hammer it will behave the same way.

Something more complicated, like a Rust compiler is also a good tool in the same way. It's vastly more intricate than the hammer, yet it still has the good tool property of being reliable; every time I press compile, if the program is wrong, then the compiler tells me that. If it's right, then the compiler passes every time. It doesn't lie, it doesn't guess, it doesn't rate limit, it doesn't charge a subscription, it doesn't silently update causing code to fail, it informs when changes are breaking and what they are, it allows me to pick my version and doesn't silently deprecate me without recourse, etc.

There are of course ecosystems out there where building a project is more like a delicate dance, or a courtship ritual, and those ecosystems are pain in the ass to deal with. I'm talking XKCD #1987, or NodeJS circa 2014, or just the entire rationale behind Docker. People exit their careers to not have to deal with such technology, because it's like working at the DMV or living in Kafka's nightmares. LLMs are more in that direction, and no one is going to like where we end up if we make them our entire stack, as seems to be the intent by the powers that be.

There's a difference between what LLMs are and what they're being sold as. For what they are, they can be useful and may one day they will be turned into good tools if some of the major flaws are fixed.

On the other hand, we are in the process of totally upending the way our industry works on the basis of what these things will be, which they are selling as essentially an oracle. "The smartest person who know in your pocket", "a simultaneous expert PhD, MD, JD", "Smarter than all humans combined". But there's a giant gulf between what they're selling and what it is, and that gulf is what makes LLMs a poor tool.

▲

vidarh 6 days ago | parent [-]

They are good tools to me.

We won't agree on this.

The provide abilities no other tool provides me with. They could be better, but they've still provided me with possibilities that I have never had before without hiring humans.

▲

ModernMech 6 days ago | parent [-]

I'm sure they're good for you, I'm not suggesting otherwise. What I'm saying is, if you ask 100 engineers to describe the properties of the best tools they use, the set of adjectives and characteristics come up with will largely not be applicable to LLMs, and ChatGPT 5 doesn't change that.

▲

vidarh 6 days ago | parent [-]

This is pure, unsubstantiated conjecture.

It's also wildly unrealistic conjecture, in my opinion.

The first, and most important, measure to me of a good tool is whether it makes me more productive. LLMs does. Measurably so.

You will certainly find people who don't believe LLMs do for them, but that won't change the fact that for a lot of us it is an immensely good tool. And a tool doesn't need to fit everyones processes to be a good tool.

	▲	ModernMech 5 days ago \| parent [-]
		> It's also wildly unrealistic conjecture, in my opinion. The only property of good tools you've mentioned that LLMs have is they do something useful. But are they reliable? No, they inexplicably work sometimes and don't work other times. Do they have a clear purpose? No, their purpose is muddled and it's sold as being capable of literally everything. Are the ergonomics good? No, the interface is completely opaque, accessed only behind a natural language text box that comes with no instructions on how to use. But then it must be intuitive, right? No, common wisdom on how to use them sound more like astrological forecasts, or the kind of advice you give people trying to get something from a toddler -- "If you want good results, first you have to get into character!"... etc. etc. > it makes me more productive. Awesome! I'm sure they are doing exactly what you say and your experiences with them are amazing. But yours and my personal productivity isn't the question facing the industry at the moment or the topic of this discussion. The question isn't whether you personally as an induvial find these things useful in your process. If that were the question I wouldn't be here complaining about them. I'm here because the powers that be are telling us that we must adopt these things into every aspect of our work lives as soon as possible, and if we don't, we deserve to be left behind. THAT is what I'm here to talk about. > And a tool doesn't need to fit everyones processes to be a good tool. I haven't argued this. Of course a tool doesn't have to fit everyone's process to be good, but we have some generally accepted principles of good tool design, and LLMs don't follow them. That doesn't preclude you from using them well in your process.

▲

Jensson 6 days ago | parent | prev [-]

> I have never met a human who has a good grasp of what they know and don't know

But humanity has a good grasp over it, we even created science to solve this. So out of all existences we are aware of humanity is by far the best at this, nothing else even comes remotely close.

▲

vidarh 6 days ago | parent [-]

No, we really don't. We have major disagreements about what we know and don't know all the time.

Like right now.

▲

efilife 5 days ago | parent [-]

A LLM needs to be told what it knows. You don't It can never with reasonable accuracy say "I don't know" as a human would

	▲	vidarh 5 days ago \| parent [-]
		And humans are often wrong both when we say we don't know, and when we claim to know. There likely is a difference in degree of accuracy, but the point I was making was that despite your claim to "certainly know what [you] know", we don't in fact know what we know with anything remotely near precision. We know some of what we know, but we can both be pressed into doing things we are certain we don't know how to do but where the knowledge is still there, and we will confidently proclaim to know things (such as the extent of our knowledge) that we don't. I will agree that LLMs need to acquire a better idea of what they know, but there is no reason to assume that knowing the limits of your own knowledge with any serious precision matters, given how bad humans are at knowing this. So much of human culture, politics, and civil life is centered around resolving conflicts that arise out of our lack of knowledge of our own limits, that this uncertainty is fairly central to what it means to be human.

▲

AaronAPU 6 days ago | parent | prev | next [-]

I’m afraid that sense of knowing what you know is very much illusory for humans as well. Everyone is just slowly having to come to terms with that.

	▲	efilife 6 days ago \| parent [-]
		No. I can tell you what skills I possess. For example: programming, writing music. A LLM can not do this unless it's told what it knows

▲

hodgehog11 6 days ago | parent | prev | next [-]

You are judging this based on what the LLM outputs, not on its internals. When we peer into its internals, it seems that LLMs actually have a pretty good representation of what they do and don't know; this just isn't reflected in the output because the relevant information is lost in future context.

▲

lblume 6 days ago | parent | prev | next [-]

Do you really know what you don't know? This would rule out unknown unknowns entirely.

▲

add-sub-mul-div 6 days ago | parent | next [-]

Yes, it's not that people know specifically what they don't know, it's that they develop the wisdom to know those boundaries and anticipate them and reduce their likelihood and impact.

For example, if I use the language of my expertise for a familiar project then the boundaries where the challenges might lie are known. If I start learning a new language for the project I won't know which areas might produce unknowns.

The LLM will happily give you code in a language it's not trained well on. With the same confidence as using any other language.

▲

efilife 6 days ago | parent | prev [-]

Sorry for copypasting from another comment but this is relevant

I can tell you what skills I possess. For example: programming, writing music. A LLM can not do this unless it's told what it knows. I could also tell you whether I studied thing X or attempted to do it and what success I had. So I'm pretty good at assessing my abilities. A LLM has no idea

	▲	lblume 6 days ago \| parent [-]
		This is interesting, because I wouldn't be too sure about that. Whether I am able to play chess well exclusively depends on the strengths of my opponents, because there is no absolute baseline. If society somehow decided that what you were writing wasn't "music" anymore, you would be the only person left stating you had that skill. I believe that most claims about one's own skills do come from outward judgement and interpretation of one's actions, not introspection. The only thing humans have is, to radically appropriate the jargon, a way longer context window (spanning over an entire lifetime!), together with many ways to compress it.

▲

gallerdude 6 days ago | parent | prev | next [-]

> OpenAI researcher Noam Brown on hallucination with the new IMO reasoning model:

> Mathematicians used to comb through model solutions because earlier systems would quietly flip an inequality or tuck in a wrong step, creating hallucinated answers.

> Brown says the updated IMO reasoning model now tends to say “I’m not sure” whenever it lacks a valid proof, which sharply cuts down on those hidden errors.

> TLDR, the model shows a clear shift away from hallucinations and toward reliable, self‑aware reasoning.

Source: https://x.com/chatgpt21/status/1950606890758476264

▲

mrcartmeneses 6 days ago | parent | prev [-]

Socrates would beg to differ