Remix.run Logo
libraryofbabel 14 hours ago

This is the 2023 take on LLMs. It still gets repeated a lot. But it doesn’t really hold up anymore - it’s more complicated than that. Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network.

Sure, LLMs do not think like humans and they may not have human-level creativity. Sometimes they hallucinate. But they can absolutely solve new problems that aren’t in their training set, e.g. some rather difficult problems on the last Mathematical Olympiad. They don’t just regurgitate remixes of their training data. If you don’t believe this, you really need to spend more time with the latest SotA models like Opus 4.5 or Gemini 3.

Nontrivial emergent behavior is a thing. It will only get more impressive. That doesn’t make LLMs like humans (and we shouldn’t anthropomorphize them) but they are not “autocomplete on steroids” anymore either.

root_axis 12 hours ago | parent | next [-]

> Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network.

This is just an appeal to complexity, not a rebuttal to the critique of likening an LLM to a human brain.

> they are not “autocomplete on steroids” anymore either.

Yes, they are. The steroids are just even more powerful. By refining training data quality, increasing parameter size, and increasing context length we can squeeze more utility out of LLMs than ever before, but ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences.

int_19h 8 hours ago | parent | next [-]

> ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences.

This tells me that you haven't really used Opus 4.5 at all.

baq 10 hours ago | parent | prev | next [-]

First, this is completely ignoring text diffusion and nano banana.

Second, to autocomplete the name of the killer in a detective book outside of the training set requires following and at least some understanding of the plot.

7 hours ago | parent [-]
[deleted]
dash2 11 hours ago | parent | prev | next [-]

This would be true if all training were based on sentence completion. But training involving RLHF and RLAIF is increasingly important, isn't it?

root_axis 10 hours ago | parent [-]

Reinforcement learning is a technique for adjusting weights, but it does not alter the architecture of the model. No matter how much RL you do, you still retain all the fundamental limitations of next-token prediction (e.g. context exhaustion, hallucinations, prompt injection vulnerability etc)

hexaga 5 hours ago | parent [-]

You've confused yourself. Those problems are not fundamental to next token prediction, they are fundamental to reconstruction losses on large general text corpora.

That is to say, they are equally likely if you don't do next token prediction at all and instead do text diffusion or something. Architecture has nothing to do with it. They arise because they are early partial solutions to the reconstruction task on 'all the text ever made'. Reconstruction task doesn't care much about truthiness until way late in the loss curve (where we probably will never reach), so hallucinations are almost as good for a very long time.

RL as is typical in post-training _does not share those early solutions_, and so does not share the fundamental problems. RL (in this context) has its own share of problems which are different, such as reward hacks like: reliance on meta signaling (# Why X is the correct solution, the honest answer ...), lying (commenting out tests), manipulation (You're absolutely right!), etc. Anything to make the human press the upvote button or make the test suite pass at any cost or whatever.

With that said, RL post-trained models _inherit_ the problems of non-optimal large corpora reconstruction solutions, but they don't introduce more or make them worse in a directed manner or anything like that. There's no reason to think them inevitable, and in principle you can cut away the garbage with the right RL target.

Thinking about architecture at all (autoregressive CE, RL, transformers, etc) is the wrong level of abstraction for understanding model behavior: instead, think about loss surfaces (large corpora reconstruction, human agreement, test suites passing, etc) and what solutions exist early and late in training for them.

A4ET8a8uTh0_v2 12 hours ago | parent | prev | next [-]

But.. and I am not asking it for giggles, does it mean humans are giant autocomplete machines?

root_axis 10 hours ago | parent [-]

Not at all. Why would it?

A4ET8a8uTh0_v2 10 hours ago | parent [-]

Call it a.. thought experiment about the question of scale.

root_axis 10 hours ago | parent [-]

I'm not exactly sure what you mean. Could you please elaborate further?

a1j9o94 10 hours ago | parent [-]

Not the person you're responding to, but I think there's a non trivial argument to make that our thoughts are just auto complete. What is the next most likely word based on what you're seeing. Ever watched a movie and guessed the plot? Or read a comment and know where it was going to go by the end?

And I know not everyone thinks in a literal stream of words all the time (I do) but I would argue that those people's brains are just using a different "token"

root_axis 9 hours ago | parent | next [-]

There's no evidence for it, nor any explanation for why it should be the case from a biological perspective. Tokens are an artifact of computer science that have no reason to exist inside humans. Human minds don't need a discrete dictionary of reality in order to model it.

Prior to LLMs, there was never any suggestion that thoughts work like autocomplete, but now people are working backwards from that conclusion based on metaphorical parallels.

LiKao 8 hours ago | parent | next [-]

There actually was quite a lot of suggestion that thoughts work like autocomplete. A lot of it was just considered niche, e.g. because the mathematical formalisms were beyond what most psychologist or even cognitive scientists would deem usefull.

Predictive coding theory was formalized back around 2010 and traces it roots up to theories by Helmholtz from 1860.

Predictive coding theory postulates that our brains are just very strong prediction machines, with multiple layers of predictive machinery, each predicting the next.

red75prime 8 hours ago | parent | prev | next [-]

There are so many theories regarding human cognition that you can certainly find something that is close to "autocomplete". A Hopfield network, for example.

Roots of predictive coding theory extend back to 1860s.

Natalia Bekhtereva was writing about compact concept representations in the brain akin to tokens.

A4ET8a8uTh0_v2 5 hours ago | parent | prev [-]

<< There's no evidence for it

Fascinating framing. What would you consider evidence here?

9dev 9 hours ago | parent | prev [-]

You, and OP, are taking an analogy way too far. Yes, humans have the mental capability to predict words similar to autocomplete, but obviously this is just one out of a myriad of mental capabilities typical humans have, which work regardless of text. You can predict where a ball will go if you throw it, you can reason about gravity, and so much more. It’s not just apples to oranges, not even apples to boats, it’s apples to intersubjective realities.

A4ET8a8uTh0_v2 5 hours ago | parent | next [-]

I don't think I am. To be honest, as ideas goes and I swirl it around that empty head of mine, this one ain't half bad given how much immediate resistance it generates.

Other posters already noted other reasons for it, but I will note that you are saying 'similar to autocomplete, but obviously' suggesting you recognize the shape and immediately dismissing it as not the same, because the shape you know in humans is much more evolved and co do more things. Ngl man, as arguments go, it sounds to me like supercharged autocomplete that was allowed to develop over a number of years.

9dev 3 hours ago | parent [-]

Fair enough. To someone with a background in biology, it sounds like an argument made by a software engineer with no actual knowledge of cognition, psychology, biology, or any related field, jumping to misled conclusions driven only by shallow insights and their own experience in computer science.

Or in other words, this thread sure attracts a lot of armchair experts.

quesera 23 minutes ago | parent [-]

> with no actual knowledge of cognition, psychology, biology

... but we also need to be careful with that assertion, because humans do not understand cognition, psychology, or biology very well.

Biology is the furthest developed, but it turns out to be like physics -- superficially and usefully modelable, but fundamental mysteries remain. We have no idea how complete our models are, but they work pretty well in our standard context.

If computer engineering is downstream from physics, and cognition is downstream from biology ... well, I just don't know how certain we can be about much of anything.

> this thread sure attracts a lot of armchair experts.

"So we beat on, boats against the current, borne back ceaselessly into our priors..."

LiKao 8 hours ago | parent | prev [-]

Look up predictive coding theory. According to that theory, what our brain does is in fact just autocomplete.

However, what it is doing is layered autocomplete on itself. I.e. one part is trying to predict what the other part will be producing and training itself on this kind of prediction.

What emerges from this layered level of autocompletes is what we call thought.

NiloCK 10 hours ago | parent | prev [-]

First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities.

Probably you believe that humans have something called intelligence, but the pressure that produced it - the likelihood of specific genetic material to replicate - it is much more tangential to intelligence than next-token-prediction.

I doubt many alien civilizations would look at us and say "not intelligent - they're just genetic information replication on steroids".

Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".

root_axis 10 hours ago | parent [-]

> First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities.

Invoking terms like "selection mechanism" is begging the question because it implicitly likens next-token-prediction training to natural selection, but in reality the two are so fundamentally different that the analogy only has metaphorical meaning. Even at a conceptual level, gradient descent gradually honing in on a known target is comically trivial compared to the blind filter of natural selection sorting out the chaos of chemical biology. It's like comparing legos to DNA.

> Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".

RL is still token prediction, it's just a technique for adjusting the weights to align with predictions that you can't model a loss function for in per-training. When RL rewards good output, it's increasing the statistical strength of the model for an arbitrary purpose, but ultimately what is achieved is still a brute force quadratic lookup for every token in the context.

vachina 5 hours ago | parent | prev | next [-]

I use enterprise LLM provided by work, working on very proprietary codebase on a semi esoteric language. My impression is it is still a very big autocompletion machine.

You still need to hand hold it all the way as it is only capable of regurgitating the tiny amount of code patterns it saw in the public. As opposed to say a Python project.

deadbolt 13 hours ago | parent | prev | next [-]

As someone who still might have a '2023 take on LLMs', even though I use them often at work, where would you recommend I look to learn more about what a '2025 LLM' is, and how they operate differently?

krackers 9 hours ago | parent | next [-]

Papers on mechanistic interpratability and representation engineering, e.g. from Anthropic would be a good start.

otabdeveloper4 8 hours ago | parent | prev [-]

Don't bother. This bubble will pop in two years, you don't want to look back on your old comments in shame in three.

otabdeveloper4 9 hours ago | parent | prev | next [-]

> it’s more complicated than that.

No it isn't.

> ...fool you into thinking you understand what is going on in that trillion parameter neural network.

It's just matrix multiplication and logistic regression, nothing more.

hackinthebochs 6 hours ago | parent [-]

LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.

The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits.

[1] https://x.com/karpathy/status/1582807367988654081

otabdeveloper4 5 hours ago | parent [-]

> LLMs are a general purpose computing paradigm.

Yes, so is logistic regression.

hackinthebochs 4 hours ago | parent [-]

No, not at all.

dingnuts 13 hours ago | parent | prev | next [-]

[dead]

beernet 4 hours ago | parent | prev [-]

>> Sometimes they hallucinate.

For someone speaking as you knew everything, you appear to know very little. Every LLM completion is a "hallucination", some of them just happen to be factually correct.