Remix.run Logo
mxkopy 3 days ago

There’s a way to talk about this stuff already. LLMs can “think” counterfactually on continuous data, just like VAEs [0], and are able to interpolate smoothly between ‘concepts’ or projections of the input data. This is meaningless when the true input space isn’t actually smooth. It’s system I, shallow-nerve psychomotor reflex type of thinking.

What LLMs can’t do is “think” counterfactually on discrete data. This is stuff like counting or adding integers. We can do this very naturally because we can think discretely very naturally, but LLMs are bad at this sort of thing because the underlying assumption behind gradient descent is that everything has a gradient (i.e. is continuous). They need discrete rules to be “burned in” [1] since minor perturbations are possible for and can affect continuous-valued weights.

You can replace “thinking” here with “information processing”. Does an LLM “think” any more or less than say, a computer solving TSP on a very large input? Seeing as we can reduce the former to the latter I wouldn’t say they’re really at all different. It seems like semantics to me.

In either case, counterfactual reasoning is good evidence of causal reasoning, which is typically one part of what we’d like AGI to be able to do (causal reasoning is deductive, the other part is inductive; this could be split into inference/training respectively but the holy grail is having these combined as zero-shot training). Regression is a basic form of counterfactual reasoning, and DL models are basically this. We don’t yet have a meaningful analogue for discrete/logic puzzley type of problems, and this is the area where I’d say that LLMs don’t “think”.

This is somewhat touched on in GEB and I suspect “Fluid Concepts and Creative Analogies” as well.

[0] https://human-interpretable-ai.github.io/assets/pdf/5_Genera...

[1] https://www.sciencedirect.com/science/article/pii/S089360802...