Remix clone Hacker News

new | show | ask | jobs Github

	▲	ManlyBread 3 days ago
		>In fact, this paper found that more than that, it thinks American. I think that's because it seems to be primarily trained on reddit and therefore mirrors everything reddit stands for. Not a good thing considering just how overrun the site is with bots and political activists of all kinds.
	▲	rollcat 3 days ago \| parent \| next [-]
		You're absolutely right! Social media like Reddit are overrun with bots, sycophants, and trolls trying to provoke reactions by engaging in controversial topics. This forms echo chambers, which is a sub-par source for training data, and reflects those biases in LLM responses.
	▲	TimByte 3 days ago \| parent \| prev [-]
		I wonder how much of that actually survives token filtering during training