Remix.run Logo
nl a day ago

> I don't think it's fair to say that at all. How are LLMs not statistical models that predict tokens? It's a big oversimplification but it doesn't seem wrong

Modern LLMs are trained via reinforcement learning where the training objective is no longer maximum next token probability.

They still produce tokens sequentially (ignoring diffusion models for now) but since the objective is so different thinking of them as next token predictors is more wrong than right.

Instead one has to think of them as trying to fit their entire output to the model learnt in the reinforcement phase. That's how reasoning in LLMs works so well.