Remix.run Logo
heikkilevanto 2 days ago

I have been speculating on adding a tar pit on my personal web site. A script that produces a page of random nonsense and random looking links to the same script. The thing not linked to anywhere, but explicitly forbidden on robots.txt. If the crawlers start on it let them get lost. Bit of rate limiting should keep my server safe, and slow down the crawlers. Maybe I should add some confusing prompts on the page as well... Probably I never get around to it, but the idea sounds tempting.

shakna 2 days ago | parent | next [-]

I have a single <a> element in my website's head, to a route banned by robots and the page is also marked by noindex meta tags and http headers.

When something grabs it, which AI crawlers regularly do, it feeds them the text of 1984, about a sentence per minute. Most crawlers stay on the line for about four hours.

dbalatero 2 days ago | parent [-]

That's hilarious, can I steal the source for my own site?

anileated 2 days ago | parent | next [-]

That’s what LLM would say…

_moof 2 days ago | parent | prev [-]

Only if you aren't a crawler.

E39M5S62 2 days ago | parent [-]

This is a long shot, but are you the same moof that ran the bot 'regurg' on EFnet in the late 90's / early 2000's for the BeOS community?

reactordev 2 days ago | parent | prev | next [-]

I did something similar. On a normal browser it just displays the matrix rain effect. For a bot, it's a page of links on links to pages that link to each other using a clever php script and .htaccess fun. The fun part is watching the logs to see how long they get stuck for. As each link is unique and can build a tree structure several GB deep on my server.

I did this once before with an ssh honey pot on my Mesos cluster in 2017.

phyzome 2 days ago | parent | prev | next [-]

Should be possible to do this with a static site, even.

Here's what I've been doing so far: https://www.brainonfire.net/blog/2024/09/19/poisoning-ai-scr... (serving scrambled versions of my posts to LLM scrapers)

gleenn 2 days ago | parent | prev | next [-]

Check out doing a compression bomb too, you can host a very small file for you that uncompresses into a massive file for crawlers and hopefully runs them out of ram and they die. Someone posted about it recently on HN even but I can't immediately find the link

extraduder_ire 2 days ago | parent [-]

It's either this one https://news.ycombinator.com/item?id=44670319 or the comments from this one https://news.ycombinator.com/item?id=44651536

I also recall reading it. I think wasting their time is more effective than making them crash and give up in this instance though.

J_McQuade 2 days ago | parent | prev | next [-]

I loved reading about something similar that popped up on HN a wee while back: https://zadzmo.org/code/nepenthes/

fbunnies 2 days ago | parent [-]

I loved reading about something dissimilar that did not pop up on HN yet: https://apnews.com/article/rabbits-with-horns-virus-colorado...

xyzal 2 days ago | parent | prev [-]

Or, serve "Emergent Misalignment" dataset.

https://github.com/emergent-misalignment/emergent-misalignme...