▲ | kitku 5 days ago | ||||||||||||||||
This reminds me of the Nepenthes tarpit [1], which is an endless source of ad-hoc generated garbled mess which links to itself over and over. Probably more effective at poisoning the dataset if one has the resources to run it. | |||||||||||||||||
▲ | fleebee 5 days ago | parent | next [-] | ||||||||||||||||
I'm running Iocaine[1] which is essentially the same thing on my tiny $3/mo VPS and it's handling crawlers bombarding the honeypot with ~12 requests per second just fine. It's using about 30 MB of RAM. | |||||||||||||||||
| |||||||||||||||||
▲ | 8organicbits 5 days ago | parent | prev [-] | ||||||||||||||||
Do we know if LLM scrapers are running JavaScript on the pages? If they are, maybe it's worth offloading the Markov model to the client side. |