Remix.run Logo
Ifkaluva 2 hours ago

I think most ML models don’t have this property. Usually it’s assumed that the training samples are “independently identically distributed”.

This is the key insight that causes the DQN algorithm to maintain a replay buffer, and randomly sample from that buffer, rather than feed in the training examples as they come, since they would have strong temporal correlation and destabilize learning.

An easy way to wreck most ML models is to feed the examples in a way that they are correlated. For example in a vision system to distinguish cats and dogs, first plan to feed in all the cats. Even worse, order the cats so there are minimal changes from one to the next, all the white cats first, and every time finding the most similar cat to the previous one. That model will fail