Remix.run Logo
8organicbits an hour ago

Anyone know the best practices for keeping AI crawlers off your RSS feeds? I know robots.txt works for the well-behaved bots. Other tools like interstitial captchas don't as the feed readers break if you send them anything but XML.

Putting just the post intro in the feed and linking to the website feels like a safer approach, assume you have bot protections on the website, but that's a poor experience for people who want to read in their feed reader.

solid_fuel 7 minutes ago | parent [-]

I have some aggressive filters in Caddy that block the worst offenders by CIDR range, and also filter by user agent to remove any honest facebook and amazon bots. Otherwise, maybe strong rate limits by IP?

Edit:

Longer term, the approach might be - provide a separate RSS feed with full content but gated by a query parameter, then only give that URL to known-good consumers via email verification or patreon subscription, etc.

It would suck that people would have to pay more to consume content in their preferred way, but depending on your needs it might be a reasonable compromise.