| ▲ | huijzer 9 hours ago |
| Yep that's also my experience. Except HN because it does not use *** Cloudflare because it knows it is not necessary. I just wrote a blog titled "Do Not Put Your Site Behind Cloudflare if You Don't Need To" [1]. [1]: https://huijzer.xyz/posts/123/ |
|
| ▲ | firecall 8 hours ago | parent | next [-] |
| Sadly, AI bots and crawlers have made CF the only affordable way to actually keep my sites up without incurring excessive image serving costs. Those TikTok AI crawlers were destroying some of my sites. Millions of images served to ByteSpider bots, over and over again.
They wouldn't stop. It was relentless abuse. :-( Now I've just blocked them all with CF. |
| |
| ▲ | flakeoil 8 hours ago | parent | next [-] | | > Now I've just blocked them all with CF. Yeah, they for sure let nothing through right now. ;) | | | |
| ▲ | zenmac 8 hours ago | parent | prev | next [-] | | Wouldn't it be trivial to just to write a uwf to block the crawler ips? At time like this really glad we self-hosted. | | |
| ▲ | cornedor 8 hours ago | parent | next [-] | | No, since they're simply too many. For an e-commerce site I work for, we once had an issue where some bad-actor tried to crawl the site to set up scam shops. The list of IPs were way too broad, and the user-agents way too generic or random. | | |
| ▲ | 72deluxe 8 hours ago | parent [-] | | Could you not also use an ASN list like https://github.com/brianhama/bad-asn-list and add blocks of IPs to a blocklist (eg. ipset on Linux)? Most of the scripty traffic comes from VPSs. | | |
| ▲ | jeroenhd 6 hours ago | parent [-] | | Thanks to widespread botnets, most scrapers fall back to using "residential proxies" the moment you block their cloud addresses. Same load, but now you risk accidentally blocking customers coming from similar net blocks. Blocking ASNs is one step of the fight, but unfortunately it's not the solution. | | |
| ▲ | immibis an hour ago | parent [-] | | Hypothetically, as a cyber-criminal, I'd like to thank the blacklist industry for bringing so much money into criminal enterprises by making residential proxies mandatory for all scraping. |
|
|
| |
| ▲ | tpetry 8 hours ago | parent | prev | next [-] | | Its not one IP to block. Its thousands! And they're also scatter through different ip networks so no simple cidr block is possible. Oh, and just for the fun, when you block their datacenter ips they switch to hundreds of residential network ips. Yes, they are really hard to block. In the end I switched to Cloudflare to just so they can handle this mess. | |
| ▲ | Bender 7 hours ago | parent | prev | next [-] | | Wouldn't it be trivial to just to write a uwf to block the crawler ips? Probably more effective would be to get the bots to exclude your IP/domain. I do this for SSH, leaving it open on my public SFTP servers on purpose. [1] If I can get 5 bot owners to exclude me that could be upwards of 250k+ nodes mostly mobile IP's that stop talking to me. Just create something that confuses and craps up the bots. With SSH bots this is trivial as most SSH bot libraries and code are unmaintained and poorly written to begin with. In my ssh example look for the VersionAddendum. Old versions of ssh, old ssh libraries and code that tries to implement ssh itself will choke on a long banner string. Not to be confused with the text banner file. I'm sure the clever people here could make something similar for HTTPS and especially for GPT/LLM bots at the risk of being flagged "malicious". [1] - https://mirror.newsdump.org/confuse-some-ssh-bots.html About 90%+ of bots can not visit this URL, including real people that have disabled HTTP/2.0 in their browser. | |
| ▲ | firecall 8 hours ago | parent | prev [-] | | Maybe :-) But for a small operation, AKA just me, it's one more thing for me to get my head around and manage. I don't run just one one website or one service. It's 100s of sites across multiple platforms! Not sure I could ever keep up playing AI Crawler and IP Whack-A-Mole! |
| |
| ▲ | UltraSane 41 minutes ago | parent | prev | next [-] | | Can you use per-IP rate limiting? | |
| ▲ | immibis an hour ago | parent | prev | next [-] | | How many requests is your site getting, and how long does your site require to process a request, and why is it that long? | |
| ▲ | unethical_ban 6 hours ago | parent | prev | next [-] | | I don't understand. What exactly are they doing, what are their goals? I'm not trying to argue, I genuinely don't get it. edit: I guess I understand "AI bots scraping sites for data to feed LLM training" but what about the image serving? | |
| ▲ | Aeolun 8 hours ago | parent | prev [-] | | > Now I've just blocked them all with CF. You realize it was possible to block bad actors before Cloudflare right? They just made it easier, not possible in the first place. | | |
| ▲ | firecall 8 hours ago | parent | next [-] | | Of course :-) And my image CDN blocked ByteSpider for me. For a while I also blocked the entirety of Singapore due to all the bots coming out of AWS over there! But it's honestly something I just dont need to be thinking about for every single site I run across a multitude of platforms. Having said that, I will now look at the options for the business critical services I operate for clients! | |
| ▲ | delfinom 7 hours ago | parent | prev [-] | | Bad actors now have access to tens of thousands of IPs and servers on the fly. The cost of hardware and software resources these days is absolute peanuts compared to 10 years ago. Cloud services and APIs has made managing them also trivial as hell. Cloudflare is simply a evolution in response to the other side also having evolved greatly, both legitimate and illegitimate users. |
|
|
|
| ▲ | MinimalAction 8 hours ago | parent | prev | next [-] |
| Yes, I never understand this obsession for centralized services like Cloudflare. To be fair though, if our tiny blogs anyway had a hundred or so visitors monthly, does it matter if it had an outage for a day? |
| |
| ▲ | ThunderSizzle 8 hours ago | parent [-] | | I think partially is not having to worry about certs is a nice reason to hide behind the proxy. Also, to help hide your IP address, I guess. Of course, on the other hand, I know that relying on Cloudflare cert's is basically inviting a MITM attack. | | |
| ▲ | huijzer 8 hours ago | parent | next [-] | | > I think partially is not having to worry about certs is a nice reason to hide behind the proxy. Use Caddy. I never worry about certs. | | |
| ▲ | ThunderSizzle 7 hours ago | parent | next [-] | | Interesting. I've done a lot of manual work to set up a whole nginx layer to properly route stuff through one domain to various self-hosted services, with way to many hard lessons when I started this journey (from trying to do manual setup without docker, to moving onto repeatable setups via docker, etc.). The setup appears very simple in Caddy - amazingly simple, honestly. I'm going to give it a good try. | |
| ▲ | immibis an hour ago | parent | prev [-] | | Or certbot-plugin-nginx if you prefer a bit less magic. |
| |
| ▲ | ptx 7 hours ago | parent | prev [-] | | Don't you need a cert anyway to secure the connection from Cloudflare to your server? | | |
| ▲ | omcnoe 7 hours ago | parent | next [-] | | Cloudflare explicitly supports customers placing insecure HTTP only sites behind a cloudflare HTTPS. It's one of the more controversial parts of the business, it makes the fact that the traffic is unencrypted on public networks invisible to the end user. | |
| ▲ | ThunderSizzle 7 hours ago | parent | prev [-] | | You could use a self-signed cert, since cloudflare doesn't care about that. |
|
|
|
|
| ▲ | ramon156 8 hours ago | parent | prev | next [-] |
| Last time I tried this I got DDoS'd so I don't see a reason to step away from CF. That said, this is the price I pay |
|
| ▲ | Illniyar 8 hours ago | parent | prev | next [-] |
| Does HN not experience DDOS? I would imagine being as popular as it is it'll experience DDOS. |
| |
|
| ▲ | zzzeek 8 hours ago | parent | prev [-] |
| ~~two~~ three comments on that: 1. DDOS protection is not the only thing anymore, I use cloudflare because of vast amounts of AI bots from thousands of ASNs around the world crawling my CI servers (bloated Java VMs on very undersized hosts) and bringing them down (granted, I threw cloudflare onto my static sites as well which was not really necessary, I just liked their analytics UX) 2. the XKCD comic is mis-interpreted there, that little block is small because it's a "small open source project run by one person", cloudflare is the opposite of that 3. edit: also cloudflare is awesome if you are migrating hosts, did a migration this past month, you point cloudflare to the new servers and it's instant DNS propagation (since you didnt propagate anything :) ) |
| |
| ▲ | dboreham 7 hours ago | parent [-] | | Why are your CI servers open to the public network? | | |
| ▲ | zzzeek 6 hours ago | parent [-] | | because we're an open source project that accepts pull requests on github and we'd like our PR submitters to see why their PRs are failing tests |
|
|