Remix.run Logo
greatgib 4 hours ago

All what was expected, first they do a huge campaign to out evil scrapers. We should use their service to ensure your website block LLMs and bots to come scraping them. Look how bad it is.

And once that is well setup, and they have their walled garden, then they can present their own API to scrape websites. All well done to be used by your LLM. But as you know, they are the gate keeper so that the Mafia boss decide what will be the "intermediary" fee that is proper for itself to let you do what you were doing without intermediary before.

shadowfiend 3 hours ago | parent [-]

No: https://developers.cloudflare.com/browser-rendering/rest-api...

greatgib 2 hours ago | parent | next [-]

That is funny because on this page there is a warning block with the following text:

   Refer to Will Browser Rendering bypass Cloudflare's Bot Protection? for instructions on creating a WAF skip rule.
And "Will Browser Rendering bypass Cloudflare's Bot Protection? " is a hash link to the FAQ page, that surprisingly doesn't anything available for this link entry.

Is it because it was removed (/hidden) or because it is not yet available until everyone forget the "we are no evil, we are here to protect the internet"?

x0x0 3 hours ago | parent | prev [-]

most websites, particularly those behind cloudflare, are very restrictive even to crawlers that obey robots. Proof: a ton of my time over the last year, and my crawlers very carefully obey robots.

It's hard to see how this isn't extorting folks by offering a working solution that, oh, cloudflare doesn't block. As long as you pay Cloudflare.

Perhaps I'm overly cynical, but I'd be quite surprised if cloudflare subjected their own headless browsing to the same rules the rest of the internet gets.

gruez 3 hours ago | parent [-]

>most websites, particularly those behind cloudflare, are very restrictive even to crawlers that obey robots. Proof: a ton of my time over the last year, and my crawlers very carefully obey robots.

The docs are pretty equivocal though:

>If you use Cloudflare products that control or restrict bot traffic such as Bot Management, Web Application Firewall (WAF), or Turnstile, the same rules will apply to the Browser Rendering crawler.

It's not just robots.txt. Most (all?) restrictions that apply to outside bots apply to cloudflare's bot as well, at least that's what they're claiming. If they're being this explicit about it, I'm willing to give them the benefit of the doubt until there's evidence to the contrary, rather than being a cynic and assuming the worst.