Remix.run Logo
cullumsmith 8 hours ago

I simply block all AI crawlers with a user-agent check in nginx.conf.

microtonal 8 hours ago | parent | next [-]

I also block all AI crawlers. I am not sure why I should give them my content for them to rip it off and make money from it through training or agents. Sadly, a lot of AI companies are trying to make requests indistinguishable from regular browsers from residential connections, so unfortunately I have to use Cloudflare to block them.

Ideally I'd make the content available to crawlers for training open models, but that seems to be nearly impossible. It would be possible if other AI companies behaved.

Barbing 8 hours ago | parent [-]

>so unfortunately I have to use Cloudflare to block them.

That can’t block Grok, can it?

(You might have a fake iPhone or something visit your site if you ask Grok to retrieve information from it)

tardedmeme 7 hours ago | parent [-]

What's the IP address of the supposed iPhone? Does it come from T-Mobile or from xAI?

Barbing 6 hours ago | parent [-]

Residential I thought? It might’ve been even someone on here who posted about watching their server logs while they messaged Grok themselves.

Curious if xAI has a phone farm. Maybe just running simulators on servers?

tardedmeme 7 hours ago | parent | prev | next [-]

This works for a few weeks to months. Then they detect your site is hostile to them and enable evasion mode, with random IP addresses and user-agent strings. Proxies are expensive so at least they're losing money.

orf 8 hours ago | parent | prev | next [-]

*some AI crawlers. Not many

robhoeijmakers 8 hours ago | parent | prev [-]

I started blocking some of them. But for now I want to improve visibility before further blocking or optimising. The dashboard helps with this.