| ▲ | root_axis 12 hours ago |
| > Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network. This is just an appeal to complexity, not a rebuttal to the critique of likening an LLM to a human brain. > they are not “autocomplete on steroids” anymore either. Yes, they are. The steroids are just even more powerful. By refining training data quality, increasing parameter size, and increasing context length we can squeeze more utility out of LLMs than ever before, but ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences. |
|
| ▲ | int_19h 8 hours ago | parent | next [-] |
| > ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences. This tells me that you haven't really used Opus 4.5 at all. |
|
| ▲ | baq 10 hours ago | parent | prev | next [-] |
| First, this is completely ignoring text diffusion and nano banana. Second, to autocomplete the name of the killer in a detective book outside of the training set requires following and at least some understanding of the plot. |
| |
|
| ▲ | dash2 11 hours ago | parent | prev | next [-] |
| This would be true if all training were based on sentence completion. But training involving RLHF and RLAIF is increasingly important, isn't it? |
| |
| ▲ | root_axis 10 hours ago | parent [-] | | Reinforcement learning is a technique for adjusting weights, but it does not alter the architecture of the model. No matter how much RL you do, you still retain all the fundamental limitations of next-token prediction (e.g. context exhaustion, hallucinations, prompt injection vulnerability etc) | | |
| ▲ | hexaga 5 hours ago | parent [-] | | You've confused yourself. Those problems are not fundamental to next token prediction, they are fundamental to reconstruction losses on large general text corpora. That is to say, they are equally likely if you don't do next token prediction at all and instead do text diffusion or something. Architecture has nothing to do with it. They arise because they are early partial solutions to the reconstruction task on 'all the text ever made'. Reconstruction task doesn't care much about truthiness until way late in the loss curve (where we probably will never reach), so hallucinations are almost as good for a very long time. RL as is typical in post-training _does not share those early solutions_, and so does not share the fundamental problems. RL (in this context) has its own share of problems which are different, such as reward hacks like: reliance on meta signaling (# Why X is the correct solution, the honest answer ...), lying (commenting out tests), manipulation (You're absolutely right!), etc. Anything to make the human press the upvote button or make the test suite pass at any cost or whatever. With that said, RL post-trained models _inherit_ the problems of non-optimal large corpora reconstruction solutions, but they don't introduce more or make them worse in a directed manner or anything like that. There's no reason to think them inevitable, and in principle you can cut away the garbage with the right RL target. Thinking about architecture at all (autoregressive CE, RL, transformers, etc) is the wrong level of abstraction for understanding model behavior: instead, think about loss surfaces (large corpora reconstruction, human agreement, test suites passing, etc) and what solutions exist early and late in training for them. |
|
|
|
| ▲ | A4ET8a8uTh0_v2 12 hours ago | parent | prev | next [-] |
| But.. and I am not asking it for giggles, does it mean humans are giant autocomplete machines? |
| |
| ▲ | root_axis 10 hours ago | parent [-] | | Not at all. Why would it? | | |
| ▲ | A4ET8a8uTh0_v2 10 hours ago | parent [-] | | Call it a.. thought experiment about the question of scale. | | |
| ▲ | root_axis 10 hours ago | parent [-] | | I'm not exactly sure what you mean. Could you please elaborate further? | | |
| ▲ | a1j9o94 10 hours ago | parent [-] | | Not the person you're responding to, but I think there's a non trivial argument to make that our thoughts are just auto complete. What is the next most likely word based on what you're seeing. Ever watched a movie and guessed the plot? Or read a comment and know where it was going to go by the end? And I know not everyone thinks in a literal stream of words all the time (I do) but I would argue that those people's brains are just using a different "token" | | |
| ▲ | root_axis 9 hours ago | parent | next [-] | | There's no evidence for it, nor any explanation for why it should be the case from a biological perspective. Tokens are an artifact of computer science that have no reason to exist inside humans. Human minds don't need a discrete dictionary of reality in order to model it. Prior to LLMs, there was never any suggestion that thoughts work like autocomplete, but now people are working backwards from that conclusion based on metaphorical parallels. | | |
| ▲ | LiKao 8 hours ago | parent | next [-] | | There actually was quite a lot of suggestion that thoughts work like autocomplete. A lot of it was just considered niche, e.g. because the mathematical formalisms were beyond what most psychologist or even cognitive scientists would deem usefull. Predictive coding theory was formalized back around 2010 and traces it roots up to theories by Helmholtz from 1860. Predictive coding theory postulates that our brains are just very strong prediction machines, with multiple layers of predictive machinery, each predicting the next. | |
| ▲ | red75prime 8 hours ago | parent | prev | next [-] | | There are so many theories regarding human cognition that you can certainly find something that is close to "autocomplete". A Hopfield network, for example. Roots of predictive coding theory extend back to 1860s. Natalia Bekhtereva was writing about compact concept representations in the brain akin to tokens. | |
| ▲ | A4ET8a8uTh0_v2 5 hours ago | parent | prev [-] | | << There's no evidence for it Fascinating framing. What would you consider evidence here? |
| |
| ▲ | 9dev 9 hours ago | parent | prev [-] | | You, and OP, are taking an analogy way too far. Yes, humans have the mental capability to predict words similar to autocomplete, but obviously this is just one out of a myriad of mental capabilities typical humans have, which work regardless of text. You can predict where a ball will go if you throw it, you can reason about gravity, and so much more. It’s not just apples to oranges, not even apples to boats, it’s apples to intersubjective realities. | | |
| ▲ | A4ET8a8uTh0_v2 5 hours ago | parent | next [-] | | I don't think I am. To be honest, as ideas goes and I swirl it around that empty head of mine, this one ain't half bad given how much immediate resistance it generates. Other posters already noted other reasons for it, but I will note that you are saying 'similar to autocomplete, but obviously' suggesting you recognize the shape and immediately dismissing it as not the same, because the shape you know in humans is much more evolved and co do more things. Ngl man, as arguments go, it sounds to me like supercharged autocomplete that was allowed to develop over a number of years. | | |
| ▲ | 9dev 4 hours ago | parent [-] | | Fair enough. To someone with a background in biology, it sounds like an argument made by a software engineer with no actual knowledge of cognition, psychology, biology, or any related field, jumping to misled conclusions driven only by shallow insights and their own experience in computer science. Or in other words, this thread sure attracts a lot of armchair experts. | | |
| ▲ | quesera 26 minutes ago | parent [-] | | > with no actual knowledge of cognition, psychology, biology ... but we also need to be careful with that assertion, because humans do not understand cognition, psychology, or biology very well. Biology is the furthest developed, but it turns out to be like physics -- superficially and usefully modelable, but fundamental mysteries remain. We have no idea how complete our models are, but they work pretty well in our standard context. If computer engineering is downstream from physics, and cognition is downstream from biology ... well, I just don't know how certain we can be about much of anything. > this thread sure attracts a lot of armchair experts. "So we beat on, boats against the current, borne back ceaselessly into our priors..." |
|
| |
| ▲ | LiKao 8 hours ago | parent | prev [-] | | Look up predictive coding theory. According to that theory, what our brain does is in fact just autocomplete. However, what it is doing is layered autocomplete on itself. I.e. one part is trying to predict what the other part will be producing and training itself on this kind of prediction. What emerges from this layered level of autocompletes is what we call thought. |
|
|
|
|
|
|
|
| ▲ | NiloCK 10 hours ago | parent | prev [-] |
| First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities. Probably you believe that humans have something called intelligence, but the pressure that produced it - the likelihood of specific genetic material to replicate - it is much more tangential to intelligence than next-token-prediction. I doubt many alien civilizations would look at us and say "not intelligent - they're just genetic information replication on steroids". Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing". |
| |
| ▲ | root_axis 10 hours ago | parent [-] | | > First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities. Invoking terms like "selection mechanism" is begging the question because it implicitly likens next-token-prediction training to natural selection, but in reality the two are so fundamentally different that the analogy only has metaphorical meaning. Even at a conceptual level, gradient descent gradually honing in on a known target is comically trivial compared to the blind filter of natural selection sorting out the chaos of chemical biology. It's like comparing legos to DNA. > Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing". RL is still token prediction, it's just a technique for adjusting the weights to align with predictions that you can't model a loss function for in per-training. When RL rewards good output, it's increasing the statistical strength of the model for an arbitrary purpose, but ultimately what is achieved is still a brute force quadratic lookup for every token in the context. |
|