I mean yeah, it’s a good essay in that it made me think and try to articulate the gaps, and I’m always looking to read things that push back on AI hype. I usually just skip over the hype blogging.

I think my biggest complaint is that the essay points out flaws in LLM’s world models (totally valid, they do confidently get things wrong and hallucinate in ways that are different, and often more frustrating, from how humans get things wrong) but then it jumps to claiming that there is some fundamental limitation about LLMs that prevents them from forming workable world models. In particular, it strays a bit towards the “they’re just stochastic parrots” critique, e.g. “that just shows the LLM knows to put the words explaining it after the words asking the question.” That just doesn’t seem to hold up in the face of e.g. LLMs getting gold on the Mathematical Olympiad, which features novel questions. If that isn’t a world model of mathematics - being able to apply learned techniques to challenging new questions - then I don’t know what is.

A lot of that success is from reinforcement learning techniques where the LLM is made to solve tons of math problems after the pre-training “read everything” step, which then gives it a chance to update its weights. LLMs aren’t just trained from reading a lot of text anymore. It’s very similar to how the alpha zero chess engine was trained, in fact.

I do think there’s a lot that the essay gets right. If I was to recast it, I’d put it something like this:

* LLMs have imperfect models of the world which is conditioned by how they’re trained on next token prediction.

* We’ve shown we can drastically improve those world models for particular tasks by reinforcement learning. you kind of allude to this already by talking about how they’ve been “flogged” to be good at math.

* I would claim that there’s no particular reason these RL techniques aren’t extensible in principle to beat all sorts of benchmarks that might look unrealistic now. (Two years ago it would have been an extreme optimist position to say an LLM could get gold on the mathematical Olympiad, and most LLM skeptics would probably have said it could never happen.)

* Of course it’s very expensive, so most world models LLMs have won’t get the RL treatment and so will be full of gaps, especially for things that aren’t amenable to RL. It’s good to beware of this.

I think the biggest limitation LLMs actually have, the one that is the biggest barrier to AGI, is that they can’t learn on the job, during inference. This means that with a novel codebase they are never able to build a good model of it, because they can never update their weights. (If an LLM was given tons of RL training on that codebase, it could build a better world model, but that’s expensive and very challenging to set up.) This problem is hinted at in your essay, but the lack of on-the-job learning isn’t centered. But it’s the real elephant in the room with LLMs and the one the boosters don’t really have an answer to.

Anyway thanks for writing this and responding!

▲

yosefk 10 days ago | parent | next [-]

I'm not saying that LLMs can't learn about the world - I even mention how they obviously do it, even at the learned embeddings level. I'm saying that they're not compelled by their training objective to learn about the world and in many cases they clearly don't, and I don't see how to characterize the opposite cases in a more useful way than "happy accidents."

I don't really know how they are made "good at math," and I'm not that good at math myself. With code I have a better gut feeling of the limitations. I do think that you could throw them off terribly with unusual math quastions to show that what they learned isn't math, but I'm not the guy to do it; my examples are about chess and programming where I am more qualified to do it. (You could say that my question about the associativity of blending and how caching works sort of shows that it can't use the concept of associativity in novel situations; not sure if this can be called an illustration of its weakness at math)

▲

8 days ago | parent | next [-]

[deleted]

▲

calf 8 days ago | parent | prev [-]

But this is parallel to saying LLMs are not "compelled" by the training algorithms to learn symbolic logic.

Which says to me there are two camps on this and the verdict is still out on this and all related questions.

▲

teleforce 7 days ago | parent [-]

>LLMs are not "compelled" by the training algorithms to learn symbolic logic.

I think "compell" is such a unique human trait that machine will never replicate to the T.

The article did mention specifically about this very issue:

"And of course people can be like that, too - eg much better at the big O notation and complexity analysis in interviews than on the job. But I guarantee you that if you put a gun to their head or offer them a million dollar bonus for getting it right, they will do well enough on the job, too. And with 200 billion thrown at LLM hardware last year, the thing can't complain that it wasn't incentivized to perform."

If it's not already evident that in itself LLM is a limited stochastic AI tool by definition and its distant cousins are the deterministic logic, optimization and constraint programming [1],[2],[3]. Perhaps one of the two breakthroughs that the author was predicting will be in this deterministic domain in order to assist LLM, and it will be the hybrid approach rather than purely LLM.

[1] Logic, Optimization, and Constraint Programming: A Fruitful Collaboration - John Hooker - CMU (2023) [video]:

https://www.youtube.com/live/TknN8fCQvRk

[2] "We Really Don't Know How to Compute!" - Gerald Sussman - MIT (2011) [video]:

https://youtube.com/watch?v=HB5TrK7A4pI

[3] Google OR-Tools:

https://developers.google.com/optimization

[4] MiniZinc:

https://www.minizinc.org/

	▲	calf 7 days ago \| parent [-]
		And yet there are two camps on the matter. Experts like Hinton disagree, others agree.

▲

2muchcoffeeman 8 days ago | parent | prev | next [-]

It’s not just on the job learning though. I’m no AI expert, but the fact that you have “prompt engineers” and AI doesn’t know what it doesn’t know, gives me pause.

If you ask an expert, they know the bounds of their knowledge and can understand questions asked to them in multiple ways. If they don’t know the answer, they could point to someone who does or just say “we don’t know”.

LLMs just lie to you and we call it “hallucinating“ as though they will eventually get it right when the drugs wear off.

▲

eru 8 days ago | parent | next [-]

> I’m no AI expert, but the fact that you have “prompt engineers” [...] gives me pause.

Why? A bunch of human workers can get a lot more done with a capable leader who helps prompt them in the right direction and corrects oversights etc.

And overall, prompt engineering seems like exactly the kind of skill AI will be able to develop by itself. You already have a bit like this happening: when you ask Gemini to create a picture for you, then the language part of Gemini will take your request and engineer a prompt for the picture part of Gemini.

▲

intended 8 days ago | parent [-]

This is the goalpost flip which happens in AI conversations. If goalpost is even the right term, conversation switch?

Theres 2 AI conversations on HN occurring simultaneously.

Convo A: Is it actually reasoning? does it have a world model? etc..

Convo B: Is it good enough right now? (for X, Y, or Z workflow)

	▲	eru 8 days ago \| parent [-]
		Maybe, yes. It's good to acknowledge that both of these conversations are worthwhile to have.

▲

Mikhail_Edoshin 8 days ago | parent | prev [-]

LLM comprehends, but does not understand. It is interesting to see these two qualities separated; so far they were synonyms.

▲

eru 8 days ago | parent | prev [-]

> A lot of that success is from reinforcement learning techniques where the LLM is made to solve tons of math problems after the pre-training “read everything” step, which then gives it a chance to update its weights. LLMs aren’t just trained from reading a lot of text anymore. It’s very similar to how the alpha zero chess engine was trained, in fact.

It's closer to AlphaGo, which first trained on expert human games and then 'fine tuned' with self-play.

AlphaZero specifically did not use human training data at all.

I am waiting for an AlphaZero style general AI. ('General' not in the GAI sense but in the ChatGPT sense of something you can throw general problems at and it will give it a good go, but not necessarily at human level, yet.) I just don't want to call it an LLM, because it wouldn't necessarily be trained on language.

What I have in mind is something that first solves lots and lots of problems, eg logic problems, formally posed programming problems, computer games, predicting of next frames in a web cam video, economic time series, whatever, as a sort-of pre-training step and then later perhaps you feed it a relatively small amount of human readable text and speech so you can talk to it.

Just to be clear: this is not meant as a suggestion for how to successfully train an AI. I'm just curious whether it would work at all and how well / how badly.

Presumably there's a reason why all SOTA models go 'predict human produced text first, then learn problem solving afterwards'.

> I think the biggest limitation LLMs actually have, the one that is the biggest barrier to AGI, is that they can’t learn on the job, during inference. This means that with a novel codebase they are never able to build a good model of it, because they can never update their weights. [...]

Yes, I agree. But 'on-the-job' training is also such an obvious idea that plenty of people are working on making it work.