| ▲ | xg15 4 days ago |
| I don't want to say that LLMs can reason, but this kind of argument always feels to shallow for me. It's kind of like saying that bats cannot possibly fly because they have no feathers or that birds cannot have higher cognitive functions because they have no neocortex. (The latter having been an actual longstanding belief in science which has been disproven only a decade or so ago). The "next token prediction" is just the API, it doesn't tell you anything about the complexity of the thing that actually does the prediction. (In think there is some temptation to view LLMs as glorified Markov chains - they aren't. They are just "implementing the same API" as Markov chains). There is still a limit how much an LLM could reason during prediction of a single token, as there is no recurrence between layers, so information can only be passed "forward". But this limit doesn't exist if you consider the generation of the entire text: Suddenly, you do have a recurrence, which is the prediction loop itself: The LLM can "store" information in a generated token and receive that information back as input in the next loop iteration. I think this structure makes it quite hard to really say how much reasoning is possible. |
|
| ▲ | griomnib 4 days ago | parent | next [-] |
| I agree with most of what you said, but “LLM can reason” is an insanely huge claim to make and most of the “evidence” so far is a mixture of corporate propaganda, “vibes”, and the like. I’ve yet to see anything close to the level of evidence needed to support the claim. |
| |
| ▲ | vidarh 4 days ago | parent | next [-] | | To say any specific LLM can reason is a somewhat significant claim. To say LLMs as a class is architecturally able to be trained to reason is - in the complete absence of evidence to suggest humans can compute functions outside the Turing computable - is effectively only an argument that they can implement a minimal Turing machine given the context is used as IO. Given the size of the rules needed to implement the smallest known Turing machines, it'd take a really tiny model for them to be unable to. Now, you can then argue that it doesn't "count" if it needs to be fed a huge program step by step via IO, but if it can do something that way, I'd need some really convincing evidence for why the static elements those steps could not progressively be embedded into a model. | | |
| ▲ | wizzwizz4 3 days ago | parent [-] | | No such evidence exists: we can construct such a model manually. I'd need some quite convincing evidence that any given training process is approximately equivalent to that, though. | | |
| ▲ | vidarh 3 days ago | parent [-] | | That's fine. I've made no claim about any given training process. I've addressed the annoying repetitive dismissal via the "but they're next token predictors" argument. The point is that being next token predictors does not limit their theoretical limits, so it's a meaningless argument. | | |
| ▲ | wizzwizz4 3 days ago | parent [-] | | The architecture of the model does place limits on how much computation can be performed per token generated, though. Combined with the window size, that's a hard bound on computational complexity that's significantly lower than a Turing machine – unless you do something clever with the program that drives the model. | | |
| ▲ | vidarh 3 days ago | parent [-] | | Hence the requirement for using the context for IO. A Turing machine requires two memory "slots" (the position of the read head, and the current state) + IO and a loop. That doesn't require much cleverness at all. |
|
|
|
| |
| ▲ | int_19h 3 days ago | parent | prev | next [-] | | "LLM can reason" is trivially provable - all you need to do is give it a novel task (e.g. a logical puzzle) that requires reasoning, and observe it solving that puzzle. | | |
| ▲ | staticman2 3 days ago | parent [-] | | How do you intend to show your task is novel? | | |
| ▲ | int_19h 19 hours ago | parent [-] | | "Novel" here simply means that the exact sequence of moves that is the solution cannot possibly be in the training set (mutatis mutandis). You can easily write a program that generates these kinds of puzzles at random, and feed them to the model. |
|
| |
| ▲ | hackinthebochs 3 days ago | parent | prev | next [-] | | Then say "no one has demonstrated that LLMs can reason" instead of "LLMs can't reason, they're just token predictors". At least that would be intellectually honest. | | |
| ▲ | Xelynega 3 days ago | parent [-] | | By that logic isn't it "intellectually dishonest" to say "dowsing rods don't work" if the only evidence we have is examples of them not working? | | |
| ▲ | hackinthebochs 3 days ago | parent [-] | | Not really. We know enough about how the world to know that dowsing rods have no plausible mechanism of action. We do not know enough about intelligence/reasoning or how brains work to know that LLMs definitely aren't doing anything resembling that. |
|
| |
| ▲ | Propelloni 4 days ago | parent | prev [-] | | It's largely dependent on what we think "reason" means, is it not? That's not a pro argument from me, in my world LLMs are stochastic parrots. |
|
|
| ▲ | vidarh 4 days ago | parent | prev [-] |
| > But this limit doesn't exist if you consider the generation of the entire text: Suddenly, you do have a recurrence, which is the prediction loop itself: The LLM can "store" information in a generated token and receive that information back as input in the next loop iteration. Now consider that you can trivially show that you can get an LLM to "execute" on step of a Turing machine where the context is used as an IO channel, and will have shown it to be Turing complete. > I think this structure makes it quite hard to really say how much reasoning is possible. Given the above, I think any argument that they can't be made to reason is effectively an argument that humans can compute functions outside the Turing computable set, which we haven't the slightest shred of evidence to suggest. |
| |
| ▲ | Xelynega 3 days ago | parent [-] | | It's kind of ridiculous to say that functions computable by turing computers are the only ones that can exist(and that trained llms are Turing computers). What evidence do you have for either of these, since I don't recall any proof that "functions computable by Turing machines" is equal to the set of functions that can exist. And I don't recall pretrained llms being proven to be Turing machines. | | |
| ▲ | vidarh 3 days ago | parent [-] | | We don't have hard evidence that no other functions exist that are computable, but we have no examples of any such functions, and no theory for how to even begin to formulate any. As it stands, Church, Turing, and Kleene have proven that the set of generally recursive functions, the lambda calculus, and the Turing computable set are equivalent, and no attempt to categorize computable functions outside those sets has succeeded since. If you want your name in the history books, all you need to do is find a single function that humans can compute that a is outside the Turing computable set. As for LLMs, you can trivially test that they can act like a Turing machine if you give them a loop and use the context to provide access to IO: Turn the temperature down, and formulate a prompt to ask one to follow the rules of the simplest known Turing machine. A reminder that the simplest known Turing machine is a 2-state, 3-symbol Turing machine. It's quite hard to find a system that can carry out any kind of complex function that can't act like a Turing machine if you allow it to loop and give it access to IO. |
|
|