Remix.run Logo
rvz 2 days ago

Good.

We finally have a viable mouse trap for LLM scrapers for them to continuously scrape garbage forever, depleting the host of their resources whilst the LLM is fed garbage which the result will be unusable to the trainer, accelerating model collapse.

It is like a never ending fast food restaurant for LLMs forced to eat garbage input and will destroy the quality of the model when used later.

Hope to see this sort of defense used widely to protect websites from LLM scrapers.

bwfan123 2 days ago | parent [-]

indeed. this will spur research on how to distinguish BS from legit content. which is the fundamental hallucination problem in llms.

and all of us will benefit from this.

ezrast a day ago | parent [-]

You can't programatically detect novel BS any more than you can programatically detect viruses or spam. You can only add the fingerprints of known badness into an ever-growing database. Viruses and spam are antagonistic to well-resourced institutions, and their databases get maintained reasonably well. LLM slop is being generated by those same well-resourced institutions. I don't think it fits into the same category as Nepenthes.