Remix.run Logo
harperlee a day ago

Two handwavey ideas upon reading this:

- Even for billion-parameter theories, a small amount of vectors might dominate the behaviour. A coordinate shift approach (PCA) might surface new concepts that enable us to model that phenomenon. "A change in perspective is worth 80 IQ points", said Alan Kay.

- There is analogue of how we come up with cognitive metaphors of the mind ("our models of the mind resemble our latest technology (abacus, mechanisms, computer, neural network)"), to be applied to other complicated areas of reality.

pash 20 hours ago | parent | next [-]

> Even for billion-parameter theories, a small amount of vectors might dominate the behaviour.

We kinda-sorta already know this is true. The lottery-ticket hypothesis [0] says that every large network contains a randomly initialized small network that performs as well as the overall network, and over the past eight years or so researchers have indeed managed to find small networks inside large networks of many different architectures that demonstrate this phenomenon.

Nobody talks much about the lottery-ticket hypothesis these days because it isn’t practically useful at the moment. (With the pruning algorithms and hardware we have, pruning is more costly than just training a big network.) But the basic idea does suggest that there may be hope for interpretability, at least in the odd application here or there.

That is, the (strong) lottery-ticket hypothesis suggests that the training process is a search through a large parameter space for a small network that already (by random initialization) exhibit the overall desired network behavior; updating parameters during the training process is mostly about turning off the irrelevant parts of the network.

For some applications, one would think that the small sub-network hiding in there somewhere might be small enough to be interpretable. I won’t be surprised if some day not too far into the future scientists investigating neural networks start to identify good interpretable models of phenomena of intermediate complexity (those phenomena that are too complex to be amenable to classic scientific techniques, but simple enough that neural networks trained to exhibit the phenomena yield unusually small active sub-networks).

0. https://en.wikipedia.org/wiki/Lottery_ticket_hypothesis

seanlinehan 19 hours ago | parent [-]

Super interesting, I've never heard of this before. Thanks for sharing!

_hark 15 hours ago | parent | prev | next [-]

You literally can do a kind of model PCA, using the Hessian (matrix of second derivatives of the loss function w/r/t the parameters, aka the local curvature of the loss landscape), and diagonalizing. These eigenvectors and eigenvalues (the spectrum of the Hessian) tend to be power-law distributed in just about every deep NN you can think of [1].

That is, there are a few "really important" (highly curved) dimensions in parameter space (the top eigenvectors) which control the model's performance (the loss function). Conversely, there are very many "unimportant"/low curvature dimensions in the model. There was a recent interesting paper that showed that "deleting" these low-curvature dimensions appeared to correspond to removing "memorized" information in LLMs, such that their reasoning performance was left unchanged while their ability to answer questions which require some memorized knowledge was reduced [2].

It appears that sometimes models undergo dramatic transitions from memorization to perfect generalization, which corresponds to the models becoming much more compressible [3].

I'm hopeful that we'll find a way to distill the models down to the most useful core cognitive/reasoning capabilities, and that that core will be far simpler than the current scale of LLMs. But they might need to look stuff up like we do without all that memorized world knowledge!

[1]: https://openreview.net/pdf?id=o62ZzfCEwZ

[2]: https://www.goodfire.ai/research/understanding-memorization-...

[3]: https://arxiv.org/abs/2412.09810

aldousd666 21 hours ago | parent | prev | next [-]

I don't disagree, but neither does the article. It's just talking about the fact that we previously considered anything that can't be easily and tersely written down as nearly or entirely intractable. But, as we have seen, the three body problem is not really a hum-dinger as far as the universe goes, it's not even table stakes. We need to be able to do the same kind of energy arbitrage on n-body problems that we do on 2. And now we have the beginnings of a place to toy with more complicated ideas -- since these won't fit on a blackboard.

pixl97 21 hours ago | parent [-]

Problems with opaque stability boundaries that observe non-liner effects are always great. Chaos theory makes it even more fun as your observation can change the outcome.

simianwords a day ago | parent | prev [-]

Maybe we can come up with smaller models that perform almost as well as the bigger ones. Could that just be pca of some kind?

Gpt nano vs gpt 5 for example.