Remix.run Logo
ljm 4 hours ago

Is cloudflare becoming a mob outfit? Because they are selling scraping countermeasures but are now selling scraping too.

And they can pull it off because of their reach over the internet with the free DNS.

pocksuppet 16 minutes ago | parent | next [-]

Was it ever not one? They protect a lot of DDoS-for-hire sites from DDoS by their competitors. In return they increase the quantity of DDoS on the internet. They offer you a service for $150, then months later suddenly demand $150k in 24 hours or they shut down your business. If you use them as a DNS registrar they will hold your domain hostage.

shadowfiend 3 hours ago | parent | prev | next [-]

No: https://developers.cloudflare.com/browser-rendering/rest-api...

oefrha 19 minutes ago | parent [-]

That's not the perfect defense you think it is. Plenty of robots.txts[1] technically allow scraping their main content pages as long as your user-agent isn't explicitly disallowed, but in practice they're behind Cloudflare so they still throw up Cloudflare bot check if you actually attempt to crawl.

And forget about crawling. If you have a less reputable IP (basically every IP in third world countries are less reputable, for instance), you can be CAPTCHA'ed to no end by Cloudflare even as a human user, on the default setting, so plenty of site owners with more reputable home/office IPs don't even know what they subject a subset of their users to.

[1] E.g. https://www.wired.com/robots.txt to pick an example high up on HN front page.

iso-logi 4 hours ago | parent | prev | next [-]

Their free DNS is only a small piece of the pie.

The fact that 30%+ of the web relies on their caching services, routablility services and DDoS protection services is the main pull.

Their DNS is only really for data collection and to front as "good will"

theamk 4 hours ago | parent | prev | next [-]

no? it takes 10 seconds to check:

> The /crawl endpoint respects the directives of robots.txt files, including crawl-delay. All URLs that /crawl is directed not to crawl are listed in the response with "status": "disallowed".

You don't need any scraping countermeasures for crawlers like those.

Macha 3 hours ago | parent [-]

So what’s the user agent for their bot? They don’t seem to specify the default in the docs and it looks like it’s user configurable. So yet another opt out bot which you need your web server to match on special behaviour to block

gruez 3 hours ago | parent [-]

>So yet another opt out bot which you need your web server to match on special behaviour to block

Given that malicious bots are allegedly spoofing real user agents, "another user agent you have to add to your list" seems like the least of your problems.

subscribed 3 hours ago | parent | prev | next [-]

I think there's some space being absolutely snuffed by the countless bots of everyone, ignoring everything, pulling from residential proxies, and this, supposedly slower, well behavior, smarter bot.

Like there's a difference between dozens of drunk teenagers thrashing the city streets in the illegal street race vs a taxi driver.

its-kostya 4 hours ago | parent | prev | next [-]

Cloudflare has been trying to mediate publishers & AI companies. If publishers are behind Cloudflare and Cloudflare's bot detection stops scrapers at the request of publishers, the publishers can allow their data to be scraped (via this end point) for a price. It creates market scarcity. I don't believe the target audience is you and me. Unless you own a very popular blog that AI companies would pay you for.

giancarlostoro 3 hours ago | parent | prev | next [-]

If they ever sell or the CEO shifts, yes. For the meantime, they have not given any strong indication that they're trying to bully anybody. I could see things changing drastically if the people in charge are swapped out.

rrr_oh_man 4 hours ago | parent | prev | next [-]

It’s a three letter agency front.

stri8ted 4 hours ago | parent | next [-]

Do you have any evidence to support this view?

pocksuppet 16 minutes ago | parent | next [-]

Who else would MITM 30% of the internet?

rolymath 3 hours ago | parent | prev [-]

Read who and how it was founded. It's not a secret at all.

mtmail 4 hours ago | parent | prev [-]

Any kind of source for the claim?

Retr0id 4 hours ago | parent | prev [-]

For a long time cloudflare has proudly protected DDoS-as-a-service sites (but of course, they claim they don't "host" them)

Dylan16807 17 minutes ago | parent [-]

Are you using the word "claim" to call them wrong or for a more confusing reason?

Because I'm pretty sure they are not in fact wrong.

Retr0id a minute ago | parent [-]

The distinction between a caching proxy and an origin server is pretty meaningless when you're serving static content, if you ask me.