Remix.run Logo
gruez 3 hours ago

>most websites, particularly those behind cloudflare, are very restrictive even to crawlers that obey robots. Proof: a ton of my time over the last year, and my crawlers very carefully obey robots.

The docs are pretty equivocal though:

>If you use Cloudflare products that control or restrict bot traffic such as Bot Management, Web Application Firewall (WAF), or Turnstile, the same rules will apply to the Browser Rendering crawler.

It's not just robots.txt. Most (all?) restrictions that apply to outside bots apply to cloudflare's bot as well, at least that's what they're claiming. If they're being this explicit about it, I'm willing to give them the benefit of the doubt until there's evidence to the contrary, rather than being a cynic and assuming the worst.