|
| ▲ | zimpenfish 4 hours ago | parent | next [-] |
| I use iocaine[0] to generate a tarpit. Yesterday it served ~278k "pages" consisting of ~500MB of gibberish (and that's despite banning most AI scrapers in robots.txt.) [0] https://iocaine.madhouse-project.org |
| |
| ▲ | chao- 3 hours ago | parent | next [-] | | Can't seem to access this. It flashes some text briefly then gives me an 418 TEAPOT response. I wonder if it's because I'm on Linux? EDIT: Begrudgingly checked Chrome, and it loads. I guess it doesn't like Firefox? | | | |
| ▲ | doublerabbit 3 hours ago | parent | prev [-] | | Unfortunately and you kind of have to count this as the cost of the Internet. You've wasted 500Mb of bandwidth. I've had colocation for eight years+. My monthly b/w cost is now around 20-30Gb a month given to scrapers where I was only be using 1-2Gb a month, years prior. I pay for premium bandwidth (it's a thing) and only get 2TB of usable data. Do I go offline or let it continue? |
|
|
| ▲ | godelski 3 hours ago | parent | prev | next [-] |
| One of the most popular ones is Anubis. It uses a proof of work and can even do poisoning: https://anubis.techaro.lol/ They even mention iocaine. I know, inconceivable!: https://iocaine.madhouse-project.org/ There's also tons of HN posts on the topic with varying solutions: https://news.ycombinator.com/item?id=45935729 https://news.ycombinator.com/item?id=45711094 https://news.ycombinator.com/item?id=44142761 https://news.ycombinator.com/item?id=44378127 |
| |
| ▲ | zzzeek a minute ago | parent [-] | | Anubis is the only tool that claims to have heuristics to identify a bot, but my understanding is that it does this by presenting obnoxious challenges to all users. Not really feasible. Old school approaches like ip blocking or even ASN blocking are obsolete - these crawlers purposely spam from thousands of IPs, and if you block them on a common ASN, they come back a few days later from thousands of unique ASNs. So this is not really a "roll your own" situation. |
|
|
| ▲ | GuinansEyebrows 4 hours ago | parent | prev | next [-] |
| https://forge.hackers.town/hackers.town/nepenthes > Citation needed this reply kinda sucks :) |
|
| ▲ | justkys 4 hours ago | parent | prev [-] |
| [dead] |