Remix.run Logo
drcongo 4 hours ago

That site doesn't seem to support pages loading either.

edit: I feel their pain - I've spent the past week fighting AI scrapers on multiple sites hitting routes that somehow bypass Cloudflare's cache. Thousands of requests per minute, often to URLs that have never even existed. Baidu and OpenAI, I'm looking at you.

comrade1234 2 hours ago | parent | next [-]

Are they hitting non-existent pages? I had ip addresses scanning my personal server including hitting pages that don't exist. I had fail2ban running already so I just turned on the nginx filters (and had to modify the regexs a bit to get them working). I turned on the recididiv jail too. It's been working great.

ndriscoll an hour ago | parent | prev | next [-]

My n100 minipc can serve over 20k requests per second with nginx (well, it could, if not for the gigabit NIC limiting it). Actually IIRC it can (again, modulo uplink) do more like 40k rps for 404 or 304s.

trollbridge 2 hours ago | parent | prev | next [-]

There is currently some AI scraper that uses residential IP addresses and a variety of techniques to conceal itself that likes downloading Swagger generated docs over… and over… and over.

Plus hitting the endpoints for authentication that return 403 over and over.

tommek4077 2 hours ago | parent | prev | next [-]

Why are "thousands" of requests noticable in any way? Webservers are so powerful nowadays.

drcongo an hour ago | parent [-]

It's not just one scraper.

jen729w 3 hours ago | parent | prev | next [-]

> often to URLs that have never even existed

Oh you're so deterministic.

mystraline an hour ago | parent | prev [-]

IP blocking Asia took my abusive scans down 95%.

I also do not have a robots.txt so google doesnt index.

Got some scanners who left a message how to index or dei dex, but was like 3 lines total in my log (thats not abusive).

But yeah, blocking the whole of Asia stopped soooo much of the net-shit.

blenderob 23 minutes ago | parent [-]

> I also do not have a robots.txt so google doesnt index.

That doesn't sound right. I don't have robots.txt too but Google indexes everything for me.

mystraline 11 minutes ago | parent [-]

https://news.ycombinator.com/item?id=46681454

I think this is a recent change.

daveoc64 a minute ago | parent [-]

All the comments there seem to suggest that there has been no change and that robots.txt isn't required.