| ▲ | bluegatty 2 hours ago | |||||||
That's not how training works - adjusting model weights to memorize a single data item is not going to fly. Model weights store abilities, not facts - generally. Unless the fact is very widely used and widely known, with a ton of context around it. The model can learn the day JFK died because there are millions of sparse examples of how that information exists in the world, but when you're working on a problem, you might have 1 concern to 'memorize'. That's going to be something different than adjusting model weights as we understand them today. LLMs are not mammals either, it's helpful analogy in terms of 'what a human might find useful' but not necessary in the context of actual llm architecture. The fact is - we don't have memory sorted out architecturally - it's either 'context or weights' and that's that. Also critically: Humans do not remember the details of the face. Not remotely. They're able to associate it with a person and name 'if they see it again' - but that's different than some kind of excellent recall. Ask them to describe features in detail and maybe we can't do it. You can see in this instance, this may be related to kind of 'soft lookup' aka associating an input with other bits of information which 'rise to the fore' as possibly useful. But overall, yes, it's fair to take the position that we'll have to 'learn from context in some way'. | ||||||||
| ▲ | observationist 2 hours ago | parent | next [-] | |||||||
Also, with regards to faces, that's kind of what I'm getting at - we don't have grid cells for faces, there seem to be discrete, functional, evolutionary structures and capabilities that combine in ways we're not consciously aware of to provide abilities. We're reflexively able to memorize faces, but to bring that to consciousness isn't automatic. There've been amnesia and lesion and other injury studies where people with face blindness get stress or anxiety, or relief, when recognizing a face, but they aren't consciously aware. A doctor, or person they didn't like, showing up caused stress spikes, but they couldn't tell you who they were or their name, and the same with family members- they get a physiological, hormonal response as if they recognized a friend or foe, but it never rises to the level of conscious recognition. There do seem to be complex cells that allow association with a recognizable face, person, icon, object, or distinctive thing. Face cells apply equally to abstractions like logos or UI elements in an app as they do to people, famous animals, unique audio stings, etc. Split brain patients also demonstrate amazing strangeness with memory and subconscious responses. There are all sorts of layers to human memory, beyond just short term, long term, REM, memory palaces, and so forth, and so there's no simple singular function of "memory" in biological brains, but a suite of different strategies and a pipeline that roughly slots into the fuzzy bucket words we use for them today. | ||||||||
| ▲ | observationist 2 hours ago | parent | prev [-] | |||||||
I suspect we're going to need hypernetworks of some sort - dynamically generated weights, with the hypernet weights getting the dream-like reconsolidation and mapping into the model at large, and layers or entire experts generated from the hypernets on the fly, a degree removed from the direct-from-weights inference being done now. I've been following some of the token-free latent reasoning and other discussions around CoT, other reasoning scaffolding, and so forth, and you just can't overcome the missing puzzle piece problem elegantly unless you have online memory. In the context of millions of concurrent users, that also becomes a nightmare. Having a pipeline, with a sort of intermediate memory, constructive and dynamic to allow resolution of problems requiring integration into memorized concepts and functions, but held out for curation and stability. It's an absolutely enormous problem, and I'm excited that it seems to be one of the primary research efforts kicking off this year. It could be a very huge capabilities step change. | ||||||||
| ||||||||