Remix.run Logo
freedomben 5 hours ago

I've had the same experience.

Philosophically I think it's terrible that Cloudflare has become a middleman in a huge and important swath of the internet. As a user, it largely makes my life much worse. It limits my browser, my ability to protect myself via VPNs, etc, and I am just browsing normally, not attacking anything. Pragmatically though, as a webmaster/admin/whatever you want to call it nowadays, Cloudflare is basically a necessity. I've started putting things behind it because if I don't, 99%+ of my traffic is bots, and often bots clearly scanning for vulnerabilities (I run mostly zero PHP sites, yet my traffic logs are often filled with requests like /admin.php and /wp-admin.php and all the wordpress things, and constant crawls from clearly not search engines that download everything and use robots.txt as a guide of what to crawl rather than what not to crawl. I haven't been DDoSed yet, but I've had images and PDFs and things downloaded so many times by these things that it costs me money. For some things where I or my family are the only legitimate users, I can just firewall-cmd all IPs except my own, but even then it's maintenance work I don't want to have to do.

I've tried many of the alternatives, and they often fail even on legitimate usecases. I've been blocked more by the alternatives than I have by Cloudflare, especially that one that does a proof of work. It works about 80% of the time, but that 20% is really, really annoying to the point that when I see that scren pop up I just browse away.

It's really a disheartening state we find ourselves in. I don't think my principles/values have been tested more in the real world than the last few years.

rglullis 4 hours ago | parent | next [-]

Either I am very lucky or what I am doing has zero value to bots, because I've been running servers online for at least 15 years, and never had any issue that couldn't be solved with basic security hygiene. I use cloudflare as my DNS for some servers, but I always disable any of their paid features. To me they could go out of business tomorrow and my servers would be chugging along just fine.

j16sdiz 3 hours ago | parent | next [-]

Sometime it is not security , it could be just bandwidth or CPU.

I have website small enough not to attract too many bot, but sometime, something very innocent can bring my website down.

For example, I put a php ical viewer.. and some crawler start loading the calendar page, taking up all the cpu cycle.

rglullis 2 hours ago | parent [-]

Even the most minimal protection would stop that.

4 hours ago | parent | prev [-]
[deleted]
dspillett 3 hours ago | parent | prev | next [-]

> and use robots.txt as a guide of what to crawl rather than what not to crawl

Mental note, make sure my robots.txt files contain a few references to slowly returning pages full of almost nonsense that link back to each other endlessly…

Not complete nonsense, that would be reasonably easy to detect and ignore. Perhaps repeats of your other content with every 5th word swapped with a random one from elsewhere in the content, every 4th word randomly misspelt, every seventh word reversed, every seventh sentence reversed, add a random sprinkling of famous names (Sir John Major, Arc de Triomphe, Sarah Jane Smith, Viltvodle VI) that make little sense in context, etc. Not enough change that automatic crap detection sees it as an obvious trap, but more than enough that ingesting data from your site into any model has enough detrimental effect to token weightings to at least undo any beneficial effect it might have had otherwise.

And when setting traps like this, make sure the response is slow enough that it won't use much bandwidth, and the serving process is very lightweight, and just in case that isn't enough make sure it aborts and errors out if any load metric goes above a given level.

matrss 3 hours ago | parent | next [-]

So, basically iocaine (https://iocaine.madhouse-project.org/). It has indeed been very useful to get the AI scraper load on a server I maintain down to a reasonable level, even with its not so strict default configuration.

willx86 3 hours ago | parent | next [-]

https://blog.cloudflare.com/ai-labyrinth/

A bit like this? ( iocaine is newer)

matrss an hour ago | parent | next [-]

First time seeing that, but yes, seems similar in concept. Iocaine can be self-hosted and put in as a "middleware" in your reverse proxy with a few lines of config, cloudflare's seems tied to their services. Cloudflares also generates garbage with generative models, while iocaine uses much simpler (and surely more "crude") methods of generating its garbage. Using LLMs to feed junk to LLMs just makes me cry, so much wasted compute.

Is iocaine actually newer though? Its first commit dates to 2025-01, while the blog post is from 2025-03. I couldn't find info on when Cloudflare started theirs. There's also Nepenthes, which had its first release in 2025-01 too.

johnisgood 2 hours ago | parent | prev [-]

If I think about it, I find it awful. The fact that we need to put junk in our own stuff just for crawlers does not sit well with me.

2 hours ago | parent | prev [-]
[deleted]
freedomben 2 hours ago | parent | prev [-]

Hot damn, this is a great idea! Reminds me fondly of an old project a friend and I built that looks like an SSH prompt or optionally an unauthed telnet listener, which looks and feels enough like a real shell that we would capture some pretty fascinating sessions of people trying to explore our system or load us with malware. Eventually somebody figured it out and then DDoSed the hell out of our stuff and would not stop hassling us. It was a good reminder that yanking people's chains sometimes really pisses them off and can attract attention and grudges that you really don't want. My friend ended up retiring his domain because he got tired of dealing with the special attention. It did allow us to capture some pretty fascinating data though that actually improved our security while it lasted.

dwedge 3 hours ago | parent | prev | next [-]

While I sympathise, I disagree with your stance. Cloudflare handle a large % of the Internet now because of people putting sites that, as you admitted, don't need to be behind it there.

kitsune1 3 hours ago | parent [-]

[dead]

4 hours ago | parent | prev [-]
[deleted]