That's post-training

No, it is not. Read the paper. They are discussing an emergent property of the context itself: "For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model."

I'm talking about tasks like multiplying two 4-digit numbers (let's say 8-digit, just to be safe, for reasoning models), which 5th or 6th graders in the US are expected to be able to do with no problem - without using a calculator.

So am I. See, for example, Karpathy's discussion of native computation: https://youtu.be/7xTGNNLPyMI?si=Gckcmp2Sby4SlKje&t=6416 (starts at 1:46:56). The first few tokens in the context actually serve as some sort of substrate for general computation. I don't pretend to understand that, and it may still be something of an open research topic, but it's one more unexpected emergent property of transformers.

You'd be crazy to trust that property at this stage -- at the time Karpathy was making the video, he needed to explicitly tell the model to "Use code" if he didn't want it to just make up solutions to more complex problems -- but you'd also be crazy to trust answers from a 5th-grader who just learned long division last week.

Feeling quite comfortable being classed in the same camp with the greatest living physicist, honestly.

Not a great time for you to rest on your intellectual laurels. Same goes for Penrose.

▲

pegasus 3 days ago | parent [-]

> No, it is not.

Yes, it is. You seem to have misunderstood what I wrote. The critique I was pointing to is of the amount of examples and energy needed during model training, which is what the "learning" in "machine learning" actually refers to. The paper uses GPT-3 which had already absorbed all that data and electricity. And the "learning" the paper talks about is arguably not real learning, since none of the acquired skills persists beyond the end of the session.

> So am I.

This is easy to settle. Go check any frontier model and see how far they get with multiplying numbers with tool calling disabled.

> Not a great time for you to rest on your intellectual laurels. Same goes for Penrose.

Neither am I resting, nor are there much laurels to rest on, at least compared to someone like Penrose. As for him, give the man a break, he's 94 years old and still sharp as a tack and intellectually productive. You're the one who's resting, imagining you've settled a question which is very much still open. Certainty is certainly intoxicating, so I understand where you're coming from, but claiming anyone who doubts computationalism is not bringing any arguments to the table is patently absurd.

	▲	CamperBob2 2 days ago \| parent [-]
		Yes, it is. You seem to have misunderstood what I wrote. The critique I was pointing to is of the amount of examples and energy needed during model training, which is what the "learning" in "machine learning" actually refers to. The paper uses GPT-3 which had already absorbed all that data and electricity. And the "learning" the paper talks about is arguably not real learning, since none of the acquired skills persists beyond the end of the session. Nobody is arguing about power consumption in this thread (but see below), and in any case the majority of power consumption is split between one-time training and the burden of running millions of prompts at once. Processing individual prompts costs almost nothing. And it's already been stipulated that lack of long-term memory is a key difference between AI and human cognition. Give them some time, sheesh. This stuff's brand new. This is easy to settle. Go check any frontier model and see how far they get with multiplying numbers with tool calling disabled. Yes, it is very easy to settle. I ran this session locally in Qwen3-Next-80B-A3B-Instruct-Q6_K: https://pastebin.com/G7Ewt5Tu This is a 6-bit quantized version of a free model that is very far from frontier level. It traces its lineage through DeepSeek, which was likely RL-trained by GPT 4.something. So 2 out of 4 isn't bad at all, really. My GPU's power consumption went up by about 40 watts while running these queries, a bit more than a human brain. If I ask the hardest of those questions on Gemini 3, it gets the right answer but definitely struggles: https://pastebin.com/MuVy9cNw As for him, give the man a break, he's 94 years old and still sharp as a tack and intellectually productive. (Shrug) As long as he chooses to contribute his views to public discourse, he's fair game for criticism. You don't have to invoke quantum woo to multiply numbers without specialized tools, as the tests above show. Consequently, I believe that a heavy burden of proof lies with anyone who invokes quantum woo to explain any other mental operations. It's a textbook violation of Occam's Razor.