Remix.run Logo
hombre_fatal 2 hours ago

It depends what your goal is.

Having to use a browser to crawl your site will slow down naive crawlers at scale.

But it wouldn't do much against individuals typing "what is a kumquat" into their local LLM tool that issues 20 requests to answer the question. They're not really going to care nor notice if the tool had to use a playwright instance instead of curl.

Yet it's that use-case that is responsible for ~all of my AI bot traffic according to Cloudflare which is 30x the traffic of direct human users. In my case, being a forum, it made more sense to just block the traffic.

ethmarks 2 hours ago | parent [-]

Maybe a stupid question but how can Cloudflare detect what portion of traffic is coming from LLM agents? Do agents identify themselves when they make requests? Are you just assuming that all playwright traffic originated from an agent?