| ▲ | CamperBob2 3 days ago | |||||||
That's post-training No, it is not. Read the paper. They are discussing an emergent property of the context itself: "For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model." I'm talking about tasks like multiplying two 4-digit numbers (let's say 8-digit, just to be safe, for reasoning models), which 5th or 6th graders in the US are expected to be able to do with no problem - without using a calculator. So am I. See, for example, Karpathy's discussion of native computation: https://youtu.be/7xTGNNLPyMI?si=Gckcmp2Sby4SlKje&t=6416 (starts at 1:46:56). The first few tokens in the context actually serve as some sort of substrate for general computation. I don't pretend to understand that, and it may still be something of an open research topic, but it's one more unexpected emergent property of transformers. You'd be crazy to trust that property at this stage -- at the time Karpathy was making the video, he needed to explicitly tell the model to "Use code" if he didn't want it to just make up solutions to more complex problems -- but you'd also be crazy to trust answers from a 5th-grader who just learned long division last week. Feeling quite comfortable being classed in the same camp with the greatest living physicist, honestly. Not a great time for you to rest on your intellectual laurels. Same goes for Penrose. | ||||||||
| ▲ | pegasus 3 days ago | parent [-] | |||||||
> No, it is not. Yes, it is. You seem to have misunderstood what I wrote. The critique I was pointing to is of the amount of examples and energy needed during model training, which is what the "learning" in "machine learning" actually refers to. The paper uses GPT-3 which had already absorbed all that data and electricity. And the "learning" the paper talks about is arguably not real learning, since none of the acquired skills persists beyond the end of the session. > So am I. This is easy to settle. Go check any frontier model and see how far they get with multiplying numbers with tool calling disabled. > Not a great time for you to rest on your intellectual laurels. Same goes for Penrose. Neither am I resting, nor are there much laurels to rest on, at least compared to someone like Penrose. As for him, give the man a break, he's 94 years old and still sharp as a tack and intellectually productive. You're the one who's resting, imagining you've settled a question which is very much still open. Certainty is certainly intoxicating, so I understand where you're coming from, but claiming anyone who doubts computationalism is not bringing any arguments to the table is patently absurd. | ||||||||
| ||||||||