Remix.run Logo
bitbasher 4 hours ago

Wow, so to prevent AI scrapers from harvesting my data I need to send all of my traffic through a third party company that gets to decide who gets to view my content. Great idea!

Aurornis 4 hours ago | parent | next [-]

You don’t need to do anything. You can use any number of solutions or roll your own.

Someone shared an alternative. Must everything in AI threads be so negative and condescending?

rester324 4 hours ago | parent | prev [-]

You can implement this yourself, who is stopping you?

zzzeek 4 hours ago | parent [-]

Citation needed

zimpenfish 4 hours ago | parent | next [-]

I use iocaine[0] to generate a tarpit. Yesterday it served ~278k "pages" consisting of ~500MB of gibberish (and that's despite banning most AI scrapers in robots.txt.)

[0] https://iocaine.madhouse-project.org

chao- 3 hours ago | parent | next [-]

Can't seem to access this.

It flashes some text briefly then gives me an 418 TEAPOT response. I wonder if it's because I'm on Linux?

EDIT: Begrudgingly checked Chrome, and it loads. I guess it doesn't like Firefox?

dpkirchner 2 hours ago | parent | next [-]

Nor Safari on iOS.

zephen 3 hours ago | parent | prev [-]

Doesn't work on my firefox either.

Friendly fire, I suppose.

godelski 2 hours ago | parent [-]

Works on my Firefox. Mac and Linux

doublerabbit 3 hours ago | parent | prev [-]

Unfortunately and you kind of have to count this as the cost of the Internet. You've wasted 500Mb of bandwidth.

I've had colocation for eight years+. My monthly b/w cost is now around 20-30Gb a month given to scrapers where I was only be using 1-2Gb a month, years prior.

I pay for premium bandwidth (it's a thing) and only get 2TB of usable data. Do I go offline or let it continue?

godelski 3 hours ago | parent | prev | next [-]

One of the most popular ones is Anubis. It uses a proof of work and can even do poisoning: https://anubis.techaro.lol/

They even mention iocaine. I know, inconceivable!: https://iocaine.madhouse-project.org/

There's also tons of HN posts on the topic with varying solutions:

https://news.ycombinator.com/item?id=45935729

https://news.ycombinator.com/item?id=45711094

https://news.ycombinator.com/item?id=44142761

https://news.ycombinator.com/item?id=44378127

zzzeek a minute ago | parent [-]

Anubis is the only tool that claims to have heuristics to identify a bot, but my understanding is that it does this by presenting obnoxious challenges to all users. Not really feasible. Old school approaches like ip blocking or even ASN blocking are obsolete - these crawlers purposely spam from thousands of IPs, and if you block them on a common ASN, they come back a few days later from thousands of unique ASNs. So this is not really a "roll your own" situation.

GuinansEyebrows 4 hours ago | parent | prev | next [-]

https://forge.hackers.town/hackers.town/nepenthes

> Citation needed

this reply kinda sucks :)

justkys 4 hours ago | parent | prev [-]

[dead]