> The toaster does not act independently of the human so it is not a closed system

I think you're mistaken. No, not at that, at the premise. I think everyone agrees here. Where you're mistaken is that when I login to Claude it says "How can I help you today?"

No one is thinking that the toaster understands things. We're using it to point out how silly the claim of "task performance == understanding" is. Techblueberry furthered this by asking if the toaster is suddenly intelligent by wrapping it with a cron job. My point was about where the line is drawn. The turning on the toaster? No, that would be silly and you clearly agree. So you have to answer why the toaster isn't understanding toast. That's the ask. Because clearly toaster toasts bread.

You and robotresearcher have still avoided answering this question. It seems dumb but that is the crux of the problem. The LLM is claimed to be understanding, right? It meets your claims of task performance. But they are still tools. They cannot act independently. I still have to prompt them. At an abstract level this is no different than the toaster. So, at what point does the toaster understand how to toast? You claim it doesn't, and I agree. You claim it doesn't because a human has to interact with it. I'm just saying that looping agents onto themselves doesn't magically make them intelligent. Just like how I can automate the whole process from planting the wheat to toasting the toast.

You're a mathematician. All I'm asking is that you abstract this out a bit and follow the logic. Clearly even our automated seed to buttered toast on a plate machine needs not have understanding.

From my physics (and engineering) background there's a key thing I've learned: all measurements are proxies. This is no different. We don't have to worry about this detail in most every day things because we're typically pretty good at measuring. But if you ever need to do something with precision, it becomes abundantly obvious. But you even use this same methodology in math all the time. Though I wouldn't say that this is equivalent to taking a hard problem, creating an isomorphic map to an easier problem, solving it, then mapping back. There's an invective nature. A ruler doesn't measure distance. A ruler is a reference to distance. A laser range finder doesn't measure distance either, it is photodetector and a timer. There is nothing in the world that you can measure directly. If we cannot do this with physical things it seems pretty silly to think we can do it with abstract concepts that we can't create robust definitions for. It's not like we've directly measured the Higgs either. But what, do you think entropy is actually a measurement of intelligible speech? Perplexity is a good tool for identifying an entropy minimizer? Or does it just correlate? Is a FID a measurement of fidelity or are we just using a useful proxy? I'm sorry, but I just don't think there are precise mathematical descriptions of things like natural English language or realistic human faces. I've developed some of the best vision models out there and I can tell you that you have to read more than the paper because while they will produce fantastic images they also produce some pretty horrendous ones. The fact that they statistically generate realistic images does not imply that they actually understand them.

  > I'm no philosopher

Why not? It sounds like you are. Do you not think about metamathematics? What math means? Do you not think about math beyond the computation? If you do, I'd call you a philosopher. There's a P in a PhD for a reason. We're not supposed to be automata. We're not supposed to be machine men, with machine minds, and machine hearts.

  > This is a tremendous pain point ... researchers will live and die on standard benchmarks.

It is a pain we share. I see it outside CS as well, but I was shocked to see the difference. Most of the other physicists and mathematicians I know that came over to CS were also surprised. And it isn't like physicists are known for their lack of egos lol

  > then you are still working in a more fortunate field

Oh, I've gotten the other comments too. That research never found publication and at the end of the day I had to graduate. Though now it can be revisited. I once was surprise to find that I saved a paper from Max Welling's group. My fellow reviewers were confident in their rejections just since they admitted to not understanding differential equations the AC sided with me (maybe they could see Welling's name? I didn't know till months after). It barely got through a workshop, but should have been in the main proceedings.

So I guess I'm saying I share this frustration. It's part of the reason I talk strongly here. I understand why people shift gears. But I think there's a big difference between begrudgingly getting on the train because you need to publish to survive and actively fueling it and shouting that all outer trains are broken and can never be fixed. One train to rule them all? I guess CS people love their binaries.

  > world model

I agree that looking at outputs tells us little about their internal mechanisms. But proof isn't symmetric in difficulty either. A world model has to be consistent. I like vision because it gives us more clues in our evaluations, let's us evaluate beyond metrics. But if we are seeing video from a POV perspective, then if we see a wall in front of us, turn left, then turn back we should still expect to see that wall, and the same one. A world model is a model beyond what is seen from the camera's view. A world model is a physics model. And I mean /a/ physics model, not "physics". There is no single physics model. Nor do I mean that a world model needs to have even accurate physics. But it does need to make consistent and counterfactual predictions. Even the geocentric model is a world model (literally a model of worlds lol). The model of the world you have in your head is this. We don't close our eyes and conclude the wall in front of you will disappear. Someone may spin you around and you still won't do this, even if you have your coordinates wrong. The issue isn't so much memory as it is understanding that walls don't just appear and disappear. It is also understanding that this also isn't always true about a cat.

I referenced the game engines because while they are impressive they are not self consistent. Walls will disappear. An enemy shooting at you will disappear sometimes if you just stop looking at it. The world doesn't disappear when I close my eyes. A tree falling in a forest still creates acoustic vibrations in the air even if there is no one to hear it.

A world model is exactly that, a model of a world. It is a superset of a model of a camera view. It is a model of the things in the world and how they interact together, regardless of if they are visible or not. Accuracy isn't actually the defining feature here, though it is a strong hint, at least it is for poor world models.

I know this last part is a bit more rambly and harder to convey. But I hope the intention came across.

▲ robotresearcher 13 hours ago | parent [-]

> You and robotresearcher have still avoided answering this question.

I have repeatedly explicitly denied the meaningfulness of the question. Understanding is a property ascribed by an observer, not possessed by a system.

You may not agree, but you can’t maintain that I’m avoiding that question. It does not have an answer that matters; that is my specific claim.

You can say a toaster understands toasting or you can not. There is literally nothing at stake there.

▲

godelski 12 hours ago | parent [-]

You said the LLMs are intelligent because they do tasks. But the claim is inconsistent with the toaster example.

If a toaster isn't intelligent because I have to give it bread and press the button to start then how's that any different from giving an LLM a prompt and pressing the button to start?

It's never been about the toaster. You're avoiding answering the question. I don't believe you're dumb, so don't act the part. I'm not buying it.

	▲	robotresearcher 11 hours ago \| parent [-]
		I didn’t describe anything as intelligent or not intelligent. I’ll bow out now. Not fun to be ascribed views I don’t have, despite trying to be as clear as I can.