Remix clone Hacker News

new | show | ask | jobs Github

	▲	parentheses 2 days ago
		I think a large issue at play here is post training. Pre training models the original distribution of input data. RL techniques tweak the models to "behave". This step changes how the models "think" in a fundamental way .