| ▲ | intalentive 3 days ago | |
Funny you say that, this write-up recalled Stephen Grossberg's Adaptive Resonance Theory for me. The same basic ideas come up when addressing the stability-plasticity dilemma. That said, the authors are saving this for future work. Fine-tuning is cheaper, easier, faster to validate. >Switching to a new architecture at pretraining time has a high cost, but there are reasons we might want this (besides the better scaling behavior). The main benefit is that the model can learn to organize its memory from scratch, and once we’ve already “allocated” this high-capacity memory pool, there’s a clearer path to learning on multiple tasks and corpora over time. This means you could "fine-tune" the model on your custom corpus at ingestion time, without having to actually train via backprop. Your corpus would be compressed into model-readable memory that updates model behavior. Then different memory units could be swapped in and out, like programs on a floppy disk. I can see this concept being especially useful for robotics. | ||
| ▲ | yorwba 3 days ago | parent [-] | |
The memory is model-readable but not model-writable, so you still need to train via backprop to get the memory to store useful data. | ||