Remix.run Logo
charcircuit 7 hours ago

Memory is not just bolted on top of the latest models. They under go training on how and when to effectively use memory and how to use compaction to avoid running out of context when working on problems.

rnxrx 6 hours ago | parent [-]

Maybe there's an analogy to our long and short term memory - immediate stimuli is processed in the context deep patterns that have accreted over a lifetime. The effect of new information can absolutely challenge a lot of those patterns but to have that information reshape how we basically think takes a lot longer - more processing, more practice, etc.

In the case of the LLM that longer-term learning / fundamental structure is a proxy for the static weights produced by a finite training process, and that the ability to use tools and store new insights and facts is analogous to shorter-term memory and "shallow" learning.

Perhaps periodic fine-tuning has an analogy in sleep or even our time spent in contemplation or practice (..or even repetition) to truly "master" a new idea and incorporate it into our broader cognitive processing. We do an amazing job of doing this kind of thing on a continuous basis while the machines (at least at this point) perform this process in discrete steps.

If our own learning process is a curve then the LLM's is a step function trying to model it. Digital vs analog.