| ▲ | cullumsmith 8 hours ago | |||||||||||||||||||||||||
I simply block all AI crawlers with a user-agent check in nginx.conf. | ||||||||||||||||||||||||||
| ▲ | microtonal 8 hours ago | parent | next [-] | |||||||||||||||||||||||||
I also block all AI crawlers. I am not sure why I should give them my content for them to rip it off and make money from it through training or agents. Sadly, a lot of AI companies are trying to make requests indistinguishable from regular browsers from residential connections, so unfortunately I have to use Cloudflare to block them. Ideally I'd make the content available to crawlers for training open models, but that seems to be nearly impossible. It would be possible if other AI companies behaved. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | tardedmeme 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
This works for a few weeks to months. Then they detect your site is hostile to them and enable evasion mode, with random IP addresses and user-agent strings. Proxies are expensive so at least they're losing money. | ||||||||||||||||||||||||||
| ▲ | orf 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
*some AI crawlers. Not many | ||||||||||||||||||||||||||
| ▲ | robhoeijmakers 8 hours ago | parent | prev [-] | |||||||||||||||||||||||||
I started blocking some of them. But for now I want to improve visibility before further blocking or optimising. The dashboard helps with this. | ||||||||||||||||||||||||||