▲ | shiomiru 2 days ago | |||||||||||||||||||||||||||||||||||||
> They mostly aren't worth worrying about Well, a common pattern I've lately been seeing is: * Website goes down/barely accessible * Webmaster posts "sorry we're down, LLM scrapers are DoSing us" * Website accessible again, but now you need JS-enabled whatever the god of the underworld is testing this week with to access it. (Alternatively, the operator decides it's not worth the trouble and the website shuts down.) So I don't think your experience about LLM scrapers "not mattering" generalizes well. | ||||||||||||||||||||||||||||||||||||||
▲ | horsawlarway 2 days ago | parent [-] | |||||||||||||||||||||||||||||||||||||
Nah - it generalizes fine. They're doing exactly what I said - adding PoW (anubis - as you point out - being one solution) to gate access. That's hardly different than things like Captchas which were a big thing even before LLMs, and also required javascript. Frankly - I'd much rather have people put Anubis in front of the site than cloudflare, as an aside. If the site really was static before, and no JS was needed - LLM scraping taking it down means it was incredibly misconfigured (an rpi can do thousands of reqs/s for static content, and caching is your friend). --- Another great solution? Just ask users to login (no js needed). I'll stand pretty firmly behind "If you aren't willing to make an account - you don't actually care about the site". My take is that search engines and sites generating revenue through ads are the most impacted. I just don't have all that much sympathy for either. Functionally - I think trying to draw a distinction between accessing a site directly and using a tool like an LLM to access a site is a mistake. Like - this was literally the mission statement of the semantic web: "unleash the computer on your behalf to interact with other computers". It just turns out we got there by letting computers deal with unstructured data, instead of making all the data structured. | ||||||||||||||||||||||||||||||||||||||
|