> any human who ever lived

Is this falsifiable? Even restricting to those currently living? On what tests? In which way? Does the category of error matter?

  > can you name one functional difference between an AI that understands, and one that merely behaves correctly in its domain of expertise?

I'd argue you didn't understand the examples from my previous comment or the direct reply[0]. Does it become a duck as soon as you are able to trick an ornithologist? All ornithologists?

But yes. Is it fair if I use Go instead of Chess? Game 4 with Lee Sedol seems an appropriate example.

Vafa also has some good examples[1,2].

But let's take an even more theoretical approach. Chess is technically a solved game since it is non-probabilistic. You can compute an optimal winning strategy from any valid state. Problem is it is intractable since the number of action state pairs is so large. But the number of moves isn't the critical part here, so let's look at Tic-Tac-Toe. We can pretty easily program up a machine that will not lose. We can put all actions and states into a graph and fit that on a computer no problem. Do you really say that the program better understands Tic-Tac-Toe than a human? I'm not sure we should even say it understands the game at all.

I don't think the situation is resolved by changing to unsolved (or effectively unsolved) games. That's the point of the Heliocentric/Geocentric example. The Geocentric Model gave many accurate predictions, but I would find it surprising if you suggested an astronomer at that time, with deep expertise in the subject, understood the configuration of the solar system better than a modern child who understands Heliocentricism. Their model makes accurate predictions and certainly more accurate than that child would, but their model is wrong. It took quite a long time for Heliocentrism to not just be proven to be correct, but to also make better predictions than Geocentrism in all situations.

So I see 2 critical problems here.

1) The more accurate model[3] can be less developed, resulting in lower predictive capabilities despite being a much more accurate representation of the verifiable environment. Accuracy and precision are different, right?

2) Test performance says nothing about coverage/generalization[4]. We can't prove our code is error free through test cases. We use them to bound our confidence (a very useful feature! I'm not against tests, but as you say, caution is good).

In [0] I referenced Dyson, I'd appreciate it if you watched that short video (again if it's been some time). How do you know you aren't making the same mistake Dyson almost did? The mistake he would have made had he not trusted Fermi? Remember, Fermi's predictions were accurate and they even stood for years.

If your answer is time, then I'm not convinced it is a sufficient explanation. It doesn't explain Fermi's "intuition" (understanding) and is just kicking the can down the road. You wouldn't be able to differentiate yourself from Dyson's mistake. So why not take caution?

And to be clear, you are the one making the stronger claim: "understanding has a well defined definition." My claim is that yours is insufficient. I'm not claiming I have an accurate and precise definition, my claim is that we need more work to get the precision. I believe your claim can be a useful abstraction (and certainly has been!), but that there are more than enough problems that we shouldn't hold to it so tightly. To use it as "proof" is naive. It is equivalent to claiming your code is error free because it passes all test cases.

[0] https://news.ycombinator.com/item?id=45622156

[1] https://arxiv.org/abs/2406.03689

[2] https://arxiv.org/abs/2507.06952

[3] Certainly placing the Earth at the center of the solar system (or universe!) is a larger error than placing the sun at the center of the solar system and failing to predict the tides or retrograde motion of Mercury.

[4] This gets exceedingly complex as we start to differentiate from memorization. I'm not sure we need to dive into what the distance from some training data needs be to make it a reasonable piece of test data, but that is a question that can't be ignored forever.

▲ robotresearcher 15 hours ago | parent [-]

>> any human who ever lived > Is this falsifiable? Even restricting to those currently living? On what tests? In which way? Does the category of error matter?

Software reliably beats the best players that have ever played it in public, including Kasparov and Carlsen, the best players of my lifetime (to my limited knowledge). By analogy to the performance ratchet we see in the rest of sports and games, and we might reasonably assume that these dominant living players are the best the world has ever seen. That could be wrong. But my argument does not hang on this point, so asking about falsifiability here doesn't do any work. Of course it's not falsifiable.

Y'know what else is not falsifiable? "That AI doesn't understand what it's doing".

  > can you name one functional difference between an AI that understands, and one that merely behaves correctly in its domain of expertise?

> I'd argue you didn't understand the examples from my previous comment or the direct reply[0]. Does it become a duck as soon as you are able to trick an ornithologist? All ornithologists?

No one seems to have changed their opinion about anything in the wake of AIs routinely passing the Turing Test. They are fooled by the chatbot passing as a human, and then ask about ducks instead. The most celebrated and seriously considered quacks like a duck argument has been won by the AIs and no-one cares.

By the way, the ornithologists' criteria for duck is probably genetic and not much to do with behavior. A dead duck is still a duck.

And because we know what a duck is, no-one is yelling at ducks that 'they don't really duck' and telling duck makers they need a revolution in duck making and they are doomed to failure if they don't listen.

Not so with 'understanding'.

▲ godelski 12 hours ago | parent [-]

  > Y'know what else is not falsifiable? "That AI doesn't understand what it's doing".

Which is why people are saying we need to put in more work to define this term. Which is the whole point of this conversation.

  > seriously considered quacks like a duck argument has been won by the AIs and no-one cares.

And have you ever considered that it's because people are refining their definitions?

Often when people find that their initial beliefs are wrong or not precise enough then they update their beliefs. You seem to be calling this a flaw. It's not like the definitions are dramatically changing, they're refining. There's a big difference

	▲	robotresearcher 11 hours ago \| parent [-]
		My first post here is me explaining that I have a non-standard definition of what ‘understanding’ means, which helps me avoid an apparently thorny issue. I’m literally here offering a refinement of a definition. This is a weird conversation.