Remix clone Hacker News

new | show | ask | jobs Github

	▲	mike_hearn 5 hours ago
		Nah, the reasons models have a left wing bias is because the training set does. It's full of output from word factories like academia, journalism and online forums moderated by leftists (e.g. Reddit). In fields where lots of RLVR is possible we can say the synthetically enhanced set somehow reflects reality, but otherwise it just reflects words, which are only a rough proxy for reality. Cleaning the dataset of this stuff is hard partly because it's difficult to precisely specify what you want to remove. "Left wing views" isn't well defined.