Remix.run Logo
isgb 3 days ago

I've been thinking it'd be nice there was a way to just block AI bots completely and allow indexing, but I'm guessing [that's impossible](https://blog.cloudflare.com/perplexity-is-using-stealth-unde...).

Are there any solutions out there that render jumbled content to crawlers? Maybe it's enough that your content shows up on google searches based on keywords, even if the preview text is jumbled.

account42 2 days ago | parent | next [-]

Tarpits seem to be the best solution today. Of course it's an arms race but if your site is small so will be the effort take to work around your solutions.

3 days ago | parent | prev | next [-]
[deleted]
pixl97 2 days ago | parent | prev [-]

How does this even make sense? At the end of the day everything has to be rendered to a screen buffer. While more expensive LLMs can read the content in that image.

About the best you could do is some kind of DRM, but that is fraught with its own dangers and problems.

isgb 2 days ago | parent [-]

Well, I'd like my writing and my code to be something I can share with other people freely, but not let it be part of a data set some company uses in their for-profit product.

Of course, a crawler can also mock user agents and fetch data in patterns that emulate real users and there'd be no way to tell - but maybe we could supply real-seeming data (at least to the crawlers we can identify) and that'll be good enough?