Remix.run Logo
HarHarVeryFunny 6 days ago

Sure - the more you use RL to steer/narrow the behavior of the model in one direction, the more you are stopping it from generating others.

RL and pre/post training is not the answer.