| ▲ | soerxpso a day ago | |||||||
I believe you're misunderstanding what the OP means about "long-term" memory. From what I can tell, it's not actively modifying the weights of the underlying model, it just "remembers" things from a high number of tokens into the past of its context. The point is that this allows it to remember something it read ~200 pages ago in a very long context window, not that it can remember something from one session into another clean session. | ||||||||
| ▲ | AlexCoventry a day ago | parent [-] | |||||||
This model has fast weights, which actually are modified during inference. | ||||||||
| ||||||||