| ▲ | pegasus 3 days ago | ||||||||||||||||
> For one thing, yes, they can That's post-training. The complaint I'm referring to is to the huge amounts of data (end energy) required during training - which is also a form of learning, after all. Sure, there are counter-arguments, for example pointing to the huge amount of non-textual data a child ingests, but these counter-arguments are not waterproof themselves (for example, one can point out that we are discussing text-only tasks). The discussion can go on and on, my point was only that cogent arguments are indeed often presented, which you were denying above. > there are plenty of humans who seemingly cannot This particular defense of LLMs has always puzzled me. By this measure, simply because there are sufficiently impaired humans, AGI has already been achieved many decades ago. > But yet you are fine with humans requiring a calculator to perform similar tasks I'm talking about tasks like multiplying two 4-digit numbers (let's say 8-digit, just to be safe, for reasoning models), which 5th or 6th graders in the US are expected to be able to do with no problem - without using a calculator. > To me, the magical thing is that they have shown us, more or less undeniably (Penrose notwithstanding), that that is all we do. Or, to put it more tersely, they have shown you that that is all we do. Penrose, myself, and lots of others don't see it quite like that. (Feeling quite comfortable being classed in the same camp with the greatest living physicist, honestly. ;) To me what LLMs do is approximate one aspect of our minds. But I have a strong hunch that the rabbit hole goes much deeper, your assessment notwithstanding. | |||||||||||||||||
| ▲ | CamperBob2 3 days ago | parent [-] | ||||||||||||||||
That's post-training No, it is not. Read the paper. They are discussing an emergent property of the context itself: "For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model." I'm talking about tasks like multiplying two 4-digit numbers (let's say 8-digit, just to be safe, for reasoning models), which 5th or 6th graders in the US are expected to be able to do with no problem - without using a calculator. So am I. See, for example, Karpathy's discussion of native computation: https://youtu.be/7xTGNNLPyMI?si=Gckcmp2Sby4SlKje&t=6416 (starts at 1:46:56). The first few tokens in the context actually serve as some sort of substrate for general computation. I don't pretend to understand that, and it may still be something of an open research topic, but it's one more unexpected emergent property of transformers. You'd be crazy to trust that property at this stage -- at the time Karpathy was making the video, he needed to explicitly tell the model to "Use code" if he didn't want it to just make up solutions to more complex problems -- but you'd also be crazy to trust answers from a 5th-grader who just learned long division last week. Feeling quite comfortable being classed in the same camp with the greatest living physicist, honestly. Not a great time for you to rest on your intellectual laurels. Same goes for Penrose. | |||||||||||||||||
| |||||||||||||||||