| ▲ | skissane 3 hours ago | |
Models have some limited means of refinement available to themselves already: augment a model with any form of external memory, and it can learn by writing to its memory and then reading relevant parts of that accumulated knowledge back in the future. Of course, this is a lot more rigid than what biological brains can do, but it isn’t nothing. Does “distributional drift and mode collapse” still happen if the outputs are filtered with respect to some external ground truth - e.g. human preferences, or even (in certain restricted domains such as coding) automated evaluations? | ||