| ▲ | tripletao 21 hours ago | |||||||
Norvig's textbook surely appears on the bookshelf of researchers including those building current top LLMs. So it's odd to say that such an approach "may not even provide a good predictive model". As of today, it is unquestionably the best known predictive model for natural language, by huge margin. I don't think that's for lack of trying, with billions of dollars or more at stake. Whether that model provides "insight" (or a "cause"; I still don't know if that's supposed to mean something different) is a deeper question, and e.g. the topic of countless papers trying to make sense of LLM activations. I don't think the answer is obvious, but I found Norvig's discussion to be thoughtful. I'm surprised to see it viewed so negatively here, dismissed with no engagement with his specific arguments and examples. | ||||||||
| ▲ | atomicnature 21 hours ago | parent | next [-] | |||||||
You can look into Judea Pearl's definitions of causality for more information. Pearl defines a ladder of causation: 1. Seeing (association) 2. Doing (intervention) 3. Imagining (counterfactuals) In his view - most ML algos are at level 1 - they look at data and draw associations, and "agents" have started some steps in level 2 - doing. The smartest of humans operate mostly in level (3) of abstractions - where they see things, gain experience, and later build up a "strong causal model" of the world and become capable of answering "what if" questions. | ||||||||
| ▲ | musicale 3 hours ago | parent | prev | next [-] | |||||||
Thanks for the response, but (per the omitted portion of my sentence before the semicolon) I was not talking about the M in LLM. I was talking about a conceptual or analytic model that a human might develop to try to predict the behavior of an LLM, per Norvig's claim of insight derived from behavioral observation. But now that I think a bit about it, the observation that an LLM seems to frequently produce obviously and/or subtly incorrect output, is not robust to prompt rewording, etc. is perhaps a useful Norvig-style insight. | ||||||||
| ▲ | foldr 11 hours ago | parent | prev | next [-] | |||||||
Chomsky's talking about predictive models in the context of cognitive science. LLMs aren't really a predictive model of any aspect of human cognitive function. | ||||||||
| ||||||||
| ▲ | D-Machine 13 hours ago | parent | prev [-] | |||||||
> I'm surprised to see it viewed so negatively here, dismissed with no engagement with his specific arguments and examples. I struggle to motivate engaging with it because it is unfortunately quite out of touch with (or just ignores) some core issues and the major advances in causal modeling and causal modeling theory, i.e. Judea Pearl and do-calculus, structural equation modeling, counterfactuals, etc [1]. It also, IMO, makes a (highly idiosyncratic) distinction between "statistical" (meaning, trained / fitted to data) and "probabilistic" models, that doesn't really hold up too well. I.e. probabilistic models in quantum physics are "fit" too, in that the values of fundamental constants are determined by experimental data, but these "statistical" models are clearly causal models regardless. Even most quantum physical models can be argued to be causal, just the causality is probabilistic rather than absolute (i.e. A ==> B is fuzzy implication rather than absolute implication). It's only if you ask deliberately broad ontological questions (e.g. "Does the wave function cause X") that you actually run into the problem of quantum models being causal or not, but for most quantum physical experiments and phenomena generally, the models are still definitely causal at the level of the particles / waves / fields involved. IMO I don't want to engage much with the arguments because it starts on the wrong foot and begins by making, in my opinion, an incoherent / unsound distinction, while also ignoring or just being out of date with the actual scientific and philosophical progress and issues already made here. I would also say there is a whole literature on tradeoffs between explanation (descriptive models in the worst case, causal models in the best case) and prediction (models that accurately reproduce some phenomenon, regardless of if they are based on and true description or causal model). There are also loads of examples of things that are perfectly deterministic and modeled by perfect "causal" models but which are of course still defy human comprehension / intuition, in that the equations need to be run on computers for us to make sense of them (differential equation models, chaotic systems, etc). Or just more practically, we can learn to do all sorts of physical and mental skills, but of course we understand barely anything about the brain and how it works and co-ordinates with the body. But obviously such an understanding is mostly irrelevant for learning how to operate effectively in the world. I.e. in practice, if the phenomenon is sufficiently complex, an accurate causal model that also accurately models the system is likely to be too complex for us to "understand" anyway (or you just have identifiability issues so you can't decide between multiple different models; or you don't have the time / resources / measurement capacity to do all the experiments needed to solve the identifiability problem anyway), so there is almost always a tradeoff between accuracy/understanding. Understanding is a nice luxury, but in many cases not important, and in complex cases, probably not achievable at all. If you are coming from this perspective, the whole "quandary" of the essay seems just odd. | ||||||||
| ||||||||