▲ | heikkilevanto 2 days ago | |||||||||||||||||||||||||||||||
I have been speculating on adding a tar pit on my personal web site. A script that produces a page of random nonsense and random looking links to the same script. The thing not linked to anywhere, but explicitly forbidden on robots.txt. If the crawlers start on it let them get lost. Bit of rate limiting should keep my server safe, and slow down the crawlers. Maybe I should add some confusing prompts on the page as well... Probably I never get around to it, but the idea sounds tempting. | ||||||||||||||||||||||||||||||||
▲ | shakna 2 days ago | parent | next [-] | |||||||||||||||||||||||||||||||
I have a single <a> element in my website's head, to a route banned by robots and the page is also marked by noindex meta tags and http headers. When something grabs it, which AI crawlers regularly do, it feeds them the text of 1984, about a sentence per minute. Most crawlers stay on the line for about four hours. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | reactordev 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
I did something similar. On a normal browser it just displays the matrix rain effect. For a bot, it's a page of links on links to pages that link to each other using a clever php script and .htaccess fun. The fun part is watching the logs to see how long they get stuck for. As each link is unique and can build a tree structure several GB deep on my server. I did this once before with an ssh honey pot on my Mesos cluster in 2017. | ||||||||||||||||||||||||||||||||
▲ | phyzome 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Should be possible to do this with a static site, even. Here's what I've been doing so far: https://www.brainonfire.net/blog/2024/09/19/poisoning-ai-scr... (serving scrambled versions of my posts to LLM scrapers) | ||||||||||||||||||||||||||||||||
▲ | gleenn 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Check out doing a compression bomb too, you can host a very small file for you that uncompresses into a massive file for crawlers and hopefully runs them out of ram and they die. Someone posted about it recently on HN even but I can't immediately find the link | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | J_McQuade 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
I loved reading about something similar that popped up on HN a wee while back: https://zadzmo.org/code/nepenthes/ | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | xyzal 2 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||
Or, serve "Emergent Misalignment" dataset. https://github.com/emergent-misalignment/emergent-misalignme... |