Remix.run Logo
CamperBob2 3 days ago

For example, the fact that they can't learn from few examples

For one thing, yes, they can, obviously [1] -- when's the last time you checked? -- and for another, there are plenty of humans who seemingly cannot.

The only real difference is that with an LLM, when the context is lost, so is the learning. That will obviously need to be addressed at some point.

that they can't perform simple mathematical operations without access to external help (via tool calling)

But yet you are fine with humans requiring a calculator to perform similar tasks? Many humans are worse at basic arithmetic than an unaided transformer network. And, tellingly, we make the same kinds of errors.

or that they have to expend so much more energy to do their magic (and yes, to me they are a bit magical), which makes some wonder if what these models do is a form of refined brute-force search, rather than ideating.

Well, of course, all they are doing is searching and curve-fitting. To me, the magical thing is that they have shown us, more or less undeniably (Penrose notwithstanding), that that is all we do. Questions that have been asked for thousands of years have now been answered: there's nothing special about the human brain, except for the ability to form, consolidate, consult, and revise long-term memories.

1: E.g., https://arxiv.org/abs/2005.14165 from 2020

pegasus 3 days ago | parent [-]

> For one thing, yes, they can

That's post-training. The complaint I'm referring to is to the huge amounts of data (end energy) required during training - which is also a form of learning, after all. Sure, there are counter-arguments, for example pointing to the huge amount of non-textual data a child ingests, but these counter-arguments are not waterproof themselves (for example, one can point out that we are discussing text-only tasks). The discussion can go on and on, my point was only that cogent arguments are indeed often presented, which you were denying above.

> there are plenty of humans who seemingly cannot

This particular defense of LLMs has always puzzled me. By this measure, simply because there are sufficiently impaired humans, AGI has already been achieved many decades ago.

> But yet you are fine with humans requiring a calculator to perform similar tasks

I'm talking about tasks like multiplying two 4-digit numbers (let's say 8-digit, just to be safe, for reasoning models), which 5th or 6th graders in the US are expected to be able to do with no problem - without using a calculator.

> To me, the magical thing is that they have shown us, more or less undeniably (Penrose notwithstanding), that that is all we do.

Or, to put it more tersely, they have shown you that that is all we do. Penrose, myself, and lots of others don't see it quite like that. (Feeling quite comfortable being classed in the same camp with the greatest living physicist, honestly. ;) To me what LLMs do is approximate one aspect of our minds. But I have a strong hunch that the rabbit hole goes much deeper, your assessment notwithstanding.

CamperBob2 3 days ago | parent [-]

That's post-training

No, it is not. Read the paper. They are discussing an emergent property of the context itself: "For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model."

I'm talking about tasks like multiplying two 4-digit numbers (let's say 8-digit, just to be safe, for reasoning models), which 5th or 6th graders in the US are expected to be able to do with no problem - without using a calculator.

So am I. See, for example, Karpathy's discussion of native computation: https://youtu.be/7xTGNNLPyMI?si=Gckcmp2Sby4SlKje&t=6416 (starts at 1:46:56). The first few tokens in the context actually serve as some sort of substrate for general computation. I don't pretend to understand that, and it may still be something of an open research topic, but it's one more unexpected emergent property of transformers.

You'd be crazy to trust that property at this stage -- at the time Karpathy was making the video, he needed to explicitly tell the model to "Use code" if he didn't want it to just make up solutions to more complex problems -- but you'd also be crazy to trust answers from a 5th-grader who just learned long division last week.

Feeling quite comfortable being classed in the same camp with the greatest living physicist, honestly.

Not a great time for you to rest on your intellectual laurels. Same goes for Penrose.

pegasus 3 days ago | parent [-]

> No, it is not.

Yes, it is. You seem to have misunderstood what I wrote. The critique I was pointing to is of the amount of examples and energy needed during model training, which is what the "learning" in "machine learning" actually refers to. The paper uses GPT-3 which had already absorbed all that data and electricity. And the "learning" the paper talks about is arguably not real learning, since none of the acquired skills persists beyond the end of the session.

> So am I.

This is easy to settle. Go check any frontier model and see how far they get with multiplying numbers with tool calling disabled.

> Not a great time for you to rest on your intellectual laurels. Same goes for Penrose.

Neither am I resting, nor are there much laurels to rest on, at least compared to someone like Penrose. As for him, give the man a break, he's 94 years old and still sharp as a tack and intellectually productive. You're the one who's resting, imagining you've settled a question which is very much still open. Certainty is certainly intoxicating, so I understand where you're coming from, but claiming anyone who doubts computationalism is not bringing any arguments to the table is patently absurd.

CamperBob2 2 days ago | parent [-]

Yes, it is. You seem to have misunderstood what I wrote. The critique I was pointing to is of the amount of examples and energy needed during model training, which is what the "learning" in "machine learning" actually refers to. The paper uses GPT-3 which had already absorbed all that data and electricity. And the "learning" the paper talks about is arguably not real learning, since none of the acquired skills persists beyond the end of the session.

Nobody is arguing about power consumption in this thread (but see below), and in any case the majority of power consumption is split between one-time training and the burden of running millions of prompts at once. Processing individual prompts costs almost nothing.

And it's already been stipulated that lack of long-term memory is a key difference between AI and human cognition. Give them some time, sheesh. This stuff's brand new.

This is easy to settle. Go check any frontier model and see how far they get with multiplying numbers with tool calling disabled.

Yes, it is very easy to settle. I ran this session locally in Qwen3-Next-80B-A3B-Instruct-Q6_K: https://pastebin.com/G7Ewt5Tu

This is a 6-bit quantized version of a free model that is very far from frontier level. It traces its lineage through DeepSeek, which was likely RL-trained by GPT 4.something. So 2 out of 4 isn't bad at all, really. My GPU's power consumption went up by about 40 watts while running these queries, a bit more than a human brain.

If I ask the hardest of those questions on Gemini 3, it gets the right answer but definitely struggles: https://pastebin.com/MuVy9cNw

As for him, give the man a break, he's 94 years old and still sharp as a tack and intellectually productive.

(Shrug) As long as he chooses to contribute his views to public discourse, he's fair game for criticism. You don't have to invoke quantum woo to multiply numbers without specialized tools, as the tests above show. Consequently, I believe that a heavy burden of proof lies with anyone who invokes quantum woo to explain any other mental operations. It's a textbook violation of Occam's Razor.