Remix clone Hacker News

new | show | ask | jobs Github

	▲	HarHarVeryFunny 6 days ago
		Sure - the more you use RL to steer/narrow the behavior of the model in one direction, the more you are stopping it from generating others. RL and pre/post training is not the answer.