Remix.run Logo
vivzkestrel 6 hours ago

rate limit per ip that progressively keeps decreasing req/mins every few mins?

prmoustache 4 hours ago | parent | next [-]

What if scrapers ips are millions of smartphones? If I was as evil as an AI scraper company that is not obeying robots.txt I would totally build/buy thousands of small games/apps for mobiles to use them as jumphosts to scape the web. This is probably happening already.

vivzkestrel 2 hours ago | parent [-]

in my case my application does not use pagination, it uses infinite scroll, even if you had a million devices that use google chrome, they would all load page 1 and if that req/minute progressively decreaasing thing is implemented, once they start scrolling endlessly they would all hit the rate limits sooner or later, the thing is a human is not going to scroll down a 100 pages but a bot will. once this difference has been factored it, it wont matter how many unique devices they bring into the battle

chii 4 hours ago | parent | prev [-]

so why not just do that for these scrapers, rather than complicate it by encrypting and decrypting, which is just obfuscation as the private key is clearly available to the end-user?

vivzkestrel 2 hours ago | parent [-]

tbh i did not encrypt decrypt for the ai scrapers at all, a lot of people were previously trying to download data directly from my API that my frontend uses and this kinda pissed me off a bit. So I added encryption/decryption to the mix and will release the newer version. As I mentioned earlier as well, can someone sit through and decrypt it? yes. Will 99% of them do it? no! Thats where I win