Remix.run Logo
naasking 5 days ago

> but rather the ability to reason in the general case, which requires the ability to LEARN to solve novel problems, which is what is missing from LLMs.

I don't think it's missing, zero shot prompting is quite successful in many cases. Maybe you find the extent that LLMs can do this to be too limited, but I'm not sure that means they don't reason at all.

> A system that has a fixed set of (reasoning/prediction) rules, but can't learn new ones for itself, seems better regarded as an expert system.

I think expert systems are a lot more limited than LLMs, so I don't agree with that classification. LLMs can generate output that's out of distribution, for instance, which is not something that's classic expert systems can do (even if you think LLM OOD is still limited compared to humans).

I've elaborated in another comment [1] what I think part of the real issue is, and why people keep getting tripped up by saying that pattern matching is not reasoning. I think it's perfectly fine to say that pattern matching is reasoning, but pattern matching has levels of expressive power. First-order pattern matching is limited (and so reasoning is limited), and clearly humans are capable of higher order pattern matching which is Turing complete. Transformers are also Turing complete, and neural networks can learn any function, so it's not a matter of expressive power, in principle.

Aside from issues stemming from tokenization, I think many of these LLM failures are because they aren't trained in higher order pattern matching. Thinking models and the generalization seen from grokking are the first steps on this path, but it's not quite there yet.

[1] https://news.ycombinator.com/item?id=45277098

HarHarVeryFunny 5 days ago | parent [-]

Powerful pattern matching is still just pattern matching.

How is an LLM going to solve a novel problem with just pattern matching?

Novel means it has never seen it before, maybe doesn't even have the knowledge needed to solve it, so it's not going to be matching any pattern, and even if it did, that would not help if it required a solution different to whatever the pattern match had come from.

Human level reasoning includes ability to learn, so that people can solve novel problems, overcome failures by trial and error, exploration, etc.

So, whatever you are calling "reasoning" isn't human level reasoning, and it's therefore not even clear what you are trying to say? Maybe just that you feel LLMs have room for improvement by better pattern matching?

naasking 5 days ago | parent [-]

> Powerful pattern matching is still just pattern matching.

Higher order pattern matching is Turing complete. Transformers are Turing complete. Memory augmented LLMs are Turing complete. Neural networks can learn to reproduce any function. These have all been proven.

So if computers can be intelligent and can solve novel problems in principle, then LLMs can too if given the right training. If you don't think computers can be intelligent, you have a much higher burden to meet.

> Human level reasoning includes ability to learn, so that people can solve novel problems, overcome failures by trial and error, exploration, etc.

You keep bringing this up as if it's lacking, but basically all existing LLM interfaces provide facilities for memory to store state. Storing progress just isn't an issue if the LLM has the right training. HN has some recent articles about Claude code just being given the task to port some GitHub repos to other programming languages, and they woke up the next morning and it did it autonomously, using issue tracking, progress reports, PRs the hole nine yards. This is frankly not the hard part IMO.

HarHarVeryFunny 5 days ago | parent [-]

Being Turing machine complete means that the system in question can emulate a Turing machine, which you could then program to do anything since it's a universal computer. So sure, if you know how to code up an AGI to run on a Turing machine you would be good to go on any Turing machine!

I'm not sure why you want to run a Turing machine emulator on an LLM, when you could just write a massively faster one to run on the computer your LLM is running on, cutting out the middle man, but whatever floats your boat I suppose.

Heck, if you really like emulation and super slow speed then how about implementing Conway's game of Life to run on your LLM Turing machine emulator, and since Life is also Turing complete you could run another Turing machine emulator on that (it's been done), and finally run your AGI on top of that! Woo hoo!

I do think you'll have a challenge prompting your LLM to emulate a Turing machine (they are really not very good at that sort of thing), especially since the prompt/context will also have to do double duty as the Turing machines (infinite length) tape, but no doubt you'll figure it out.

Keep us posted.

I'll be excited to see your AGI program when you write that bit.

naasking 4 days ago | parent [-]

The point has nothing to do with speed, but with expressive power / what is achievable and learnable, in principle. Again, if you accept that a computer can in principle run a program that qualifies as AGI, then all I'm saying is that an LLM with memory augmentation can in principle be trained to do this as well because their computation power is formally equivalent.

And coincidentally, a new paper being discussed on HN is a good example addressing your concern about existing models learning and developing novel things. Here's a GPT model that learned physics just by training on a data:

https://arxiv.org/abs/2509.13805

HarHarVeryFunny 4 days ago | parent [-]

You seem to want to say that because an LLM is Turing complete (a doubtful claim) it should be able to implement AGI, which would be a logical conclusion, but yet totally irrelevant.

If the only thing missing to implement AGI was a Turing machine to run it on, then we'd already have AGI running on Conway's game of Life, or perhaps on a Google supercomputer.

> Here's a GPT model that learned physics just by training on a data

It didn't learn at run-time. It was PRE-trained, using SGD on the entire training set, the way that GPT's (Generative PRE-trained Transformers) always are.

In order to learn at run-time, or better yet get rid of the distinction between pre-training and run-time, requires someone to invent (or copy from nature) a new incremental learning algorithm that:

a) Doesn't require retraining on everything it was ever previously trained on, and

b) Doesn't cause it to forget, or inappropriately change, things it had previously learnt

These are easier said than done, which is why we're a decade or so into the "deep learning" revolution, and nothing much has changed other than fine-tuning which is still a bulk data technique.