| ▲ | jmyeet 3 hours ago | |
Yeah, this is something I've been thinking about too. LLMs have basically profited from "stealing" (arguably) user-generated content from a time when there were no LLMs. In the LLM era there won't be a new Stack Overflow to train LLMs on going forward. We're getting closer to Dead Internet Theory too where a lot of accounts, particularly on Twitter, are just LLMs. I imagine it's a huge problem on Reddit too. Just people farming karma or otherwise involved in influence campaigns or simply grifting to ad revenue. So we're going to get to a point where the corpus we train LLMs on will itself just be filled with LLM slops. Self-reinforcing slop. Is that the future? | ||
| ▲ | aucisson_masque 2 hours ago | parent | next [-] | |
It's been studied,LLM that feed on its own data regress and it becomes very bad after a few generations. | ||
| ▲ | mattmanser 3 hours ago | parent | prev [-] | |
It's happening here too, I saw dang hint that they're not even responding to a lot of questions about it anymore because of the volume of the problem. If you browse with showdead on you'll be seeing a lot more of what look like reasonable comments greyed out. | ||