Remix.run Logo
superkuh 4 hours ago

Recently there have been more crawlers coming from tens to hundreds of IP netblocks from dozens (or more!) of ASN in highly time and URL correlated fashion with spoofed user-agent(s) and no regard for rate or request limiting or robots.txt. These attempt to visit every possible permutation of URLs on the domain and have a lot of bandwidth and established tcp connections available to them. It's not that this didn't happen pre-2023 but it's noticably more common now. If you have a public webserver you've probably experienced it at least once.

Actual LLM involvement as the requesting user-agent is vanishingly small. It's the same problem as ever: corporations, their profit motive during $hypecycle coupled with access to capital for IT resources, and the protection of the abusers via the company's abstraction away of legal liability for their behavior.