Remix.run Logo
vivzkestrel 11 hours ago

- the frontend has a decryption module that ll show users what they want to see,

- the backend has an encryption module.

- The bots and crawlers will see the encrypted text

- Can someone who peeks deeply inside the client side code decrypt it? YES

- Will 99% of the scrapers bother doing this? NO

- The key can be anything, it could be a per session key agreed upon between the client and the server, a csrf token, or even a fixed key

hollowturtle 9 hours ago | parent | next [-]

Ehm what would stop ai scrapers from using a browser like a normal user would? Google bot already does, it can execute js and can read spa client side generated content, so it proves can be done at scale, and I'm pretty sure some ai scrapers already do

dirkc 3 hours ago | parent | next [-]

If you decrypt the content on the client side using an expensive decryption algorithm the scraper needs to spend the computing resource to decrypt.

integralid an hour ago | parent [-]

Every visitor too. Mobile users are going to love this.

vivzkestrel 8 hours ago | parent | prev [-]

rate limit per ip that progressively keeps decreasing req/mins every few mins?

prmoustache 6 hours ago | parent | next [-]

What if scrapers ips are millions of smartphones? If I was as evil as an AI scraper company that is not obeying robots.txt I would totally build/buy thousands of small games/apps for mobiles to use them as jumphosts to scape the web. This is probably happening already.

vivzkestrel 4 hours ago | parent [-]

in my case my application does not use pagination, it uses infinite scroll, even if you had a million devices that use google chrome, they would all load page 1 and if that req/minute progressively decreaasing thing is implemented, once they start scrolling endlessly they would all hit the rate limits sooner or later, the thing is a human is not going to scroll down a 100 pages but a bot will. once this difference has been factored it, it wont matter how many unique devices they bring into the battle

chii 6 hours ago | parent | prev [-]

so why not just do that for these scrapers, rather than complicate it by encrypting and decrypting, which is just obfuscation as the private key is clearly available to the end-user?

vivzkestrel 4 hours ago | parent [-]

tbh i did not encrypt decrypt for the ai scrapers at all, a lot of people were previously trying to download data directly from my API that my frontend uses and this kinda pissed me off a bit. So I added encryption/decryption to the mix and will release the newer version. As I mentioned earlier as well, can someone sit through and decrypt it? yes. Will 99% of them do it? no! Thats where I win

PunchyHamster 8 hours ago | parent | prev [-]

the AI will just run chrome instance