Remix.run Logo
trjordan 3 days ago

One of the funny things about LLMs and modern AI is that "the ability to recognize a cat" isn't a trained behavior anymore, as described here. It's an emergent property of training it to predict a lot of things, and cats happens to be present enough in the data such that they're one of the things you can ask a larger model and have it work.

My favorite work on digging into the models to explain this is Golden Gate Claude [0]. Basically, the folks at Anthropic went digging into the many-level, many-parameter model and found the neurons associated with the Golden Gate Bridge. Dialing it up to 11 made Claude bring up the bridge in response to literally everything.

I'm super curious to see how much of this "intuitive" model of neural networks can be backed out effectively, and what that does to how we use it.

[0] https://www.anthropic.com/news/golden-gate-claude