Remix.run Logo
BLKNSLVR 8 hours ago

I really don't know how effective my little system would be against these scrapers, but I've setup a system that blocks IP addresses if they've attempted to connect to ports on my system(s) behind which there are no services, and therefore their connections must be 'uninvited', which I classify as malicious.

Since I do actually host a couple of websites / services behind port 443, it means I can't just block everything that tries to scan my ip address at port 443. However, I've setup Cloudflare in front of those websites, so I do log and block any non-Cloudflare (using Cloudflare's ASN: 13335) traffic coming into port 443.

I also log and block IP address attempting to connect on port 80, since that essentially deprecated.

This, of course, does not block traffic coming via the DNS names of the sites, since that will be routed through Cloudflare - but as someone mentioned, Cloudflare has its own anti-scraping tools. And then as another person mentioned, this does require the use of Cloudflare, which is a vast centralising force on the Internet and therefore part of a different problem...

I don't currently split out a separate list for IP addresses that have connected to HTTP(S) ports, but maybe I'll do that over Christmas.

This is my current simple project: https://github.com/UninvitedActivity/UninvitedActivity

Apologies if the README is a bit rambling. It's evolved over time, and it's mostly for me anyway.

P.S. I always thought it was Yog Sothoth (not Sototh). Either way, I'm partial to Nyarlathotep. "The Crawling Chaos" always sounded like the coolest of the elder gods.

ewpratten 8 hours ago | parent [-]

Regarding the Cloudflare part of this, I’d recommend taking a look at “Authenticated Origin Pulls”. It lets you perform your validation at the TLS layer instead of doing it with IP ACLs if that interests you.