Remix.run Logo
NoiseBert69 8 hours ago

Hm.. why not using dumbed down small, self-hosted LLM networks to feet the big scrapers with bullshit?

I'd sacrifice two CPU cores for this just to make their life awful.

Findecanor 6 hours ago | parent | next [-]

You don't need an LLM for that. There is a link in the article to an approach using Markov chains created from real-world books, but then you'd let the scrapers' LLMs re-enforce their training on those books and not on random garbage.

I would make a list of words from each word class, and a list of sentence structures where each item is a word class. Pick a pseudo-random sentence; for each word class in the sentence, pick a pseudo-random word; output; repeat. That should be pretty simple and fast.

I'd think the most important thing though is to add delays to serving the requests. The purpose is to slow the scrapers down, not to induce demand on your garbage well.

qezz 6 hours ago | parent | prev [-]

That's very expensive.