Remix clone Hacker News

new | show | ask | jobs Github

	▲	jmyeet 3 hours ago
		Yeah, this is something I've been thinking about too. LLMs have basically profited from "stealing" (arguably) user-generated content from a time when there were no LLMs. In the LLM era there won't be a new Stack Overflow to train LLMs on going forward. We're getting closer to Dead Internet Theory too where a lot of accounts, particularly on Twitter, are just LLMs. I imagine it's a huge problem on Reddit too. Just people farming karma or otherwise involved in influence campaigns or simply grifting to ad revenue. So we're going to get to a point where the corpus we train LLMs on will itself just be filled with LLM slops. Self-reinforcing slop. Is that the future?
	▲	aucisson_masque 2 hours ago \| parent \| next [-]
		It's been studied,LLM that feed on its own data regress and it becomes very bad after a few generations.
	▲	mattmanser 3 hours ago \| parent \| prev [-]
		It's happening here too, I saw dang hint that they're not even responding to a lot of questions about it anymore because of the volume of the problem. If you browse with showdead on you'll be seeing a lot more of what look like reasonable comments greyed out.