Remix.run Logo
SchemaLoad 5 hours ago

Cloudflare has a service for this now that will detect AI scrapers and send them to a tarpit of infinite AI generated nonsense pages.

bitbasher 4 hours ago | parent | next [-]

Wow, so to prevent AI scrapers from harvesting my data I need to send all of my traffic through a third party company that gets to decide who gets to view my content. Great idea!

Aurornis 4 hours ago | parent | next [-]

You don’t need to do anything. You can use any number of solutions or roll your own.

Someone shared an alternative. Must everything in AI threads be so negative and condescending?

rester324 4 hours ago | parent | prev [-]

You can implement this yourself, who is stopping you?

zzzeek 4 hours ago | parent [-]

Citation needed

zimpenfish 4 hours ago | parent | next [-]

I use iocaine[0] to generate a tarpit. Yesterday it served ~278k "pages" consisting of ~500MB of gibberish (and that's despite banning most AI scrapers in robots.txt.)

[0] https://iocaine.madhouse-project.org

chao- 3 hours ago | parent | next [-]

Can't seem to access this.

It flashes some text briefly then gives me an 418 TEAPOT response. I wonder if it's because I'm on Linux?

EDIT: Begrudgingly checked Chrome, and it loads. I guess it doesn't like Firefox?

dpkirchner 2 hours ago | parent | next [-]

Nor Safari on iOS.

zephen 3 hours ago | parent | prev [-]

Doesn't work on my firefox either.

Friendly fire, I suppose.

godelski 2 hours ago | parent [-]

Works on my Firefox. Mac and Linux

doublerabbit 3 hours ago | parent | prev [-]

Unfortunately and you kind of have to count this as the cost of the Internet. You've wasted 500Mb of bandwidth.

I've had colocation for eight years+. My monthly b/w cost is now around 20-30Gb a month given to scrapers where I was only be using 1-2Gb a month, years prior.

I pay for premium bandwidth (it's a thing) and only get 2TB of usable data. Do I go offline or let it continue?

godelski 3 hours ago | parent | prev | next [-]

One of the most popular ones is Anubis. It uses a proof of work and can even do poisoning: https://anubis.techaro.lol/

They even mention iocaine. I know, inconceivable!: https://iocaine.madhouse-project.org/

There's also tons of HN posts on the topic with varying solutions:

https://news.ycombinator.com/item?id=45935729

https://news.ycombinator.com/item?id=45711094

https://news.ycombinator.com/item?id=44142761

https://news.ycombinator.com/item?id=44378127

zzzeek 3 minutes ago | parent [-]

Anubis is the only tool that claims to have heuristics to identify a bot, but my understanding is that it does this by presenting obnoxious challenges to all users. Not really feasible. Old school approaches like ip blocking or even ASN blocking are obsolete - these crawlers purposely spam from thousands of IPs, and if you block them on a common ASN, they come back a few days later from thousands of unique ASNs. So this is not really a "roll your own" situation.

GuinansEyebrows 4 hours ago | parent | prev | next [-]

https://forge.hackers.town/hackers.town/nepenthes

> Citation needed

this reply kinda sucks :)

justkys 4 hours ago | parent | prev [-]

[dead]

timpera 4 hours ago | parent | prev | next [-]

Unfortunately, Cloudflare often destroys the experience for users with shared connections, VPNs, exotic browsers… I had to remove it from my site after too many complaints.

xorcist 4 hours ago | parent | next [-]

I am sure Cloudflare would have no problem selling you a VPN service.

After all, it's not very far from hosting booters and selling DoS protection.

sadeshmukh 2 hours ago | parent [-]

Well... https://developers.cloudflare.com/warp-client/warp-modes/#wa...

Price is $5/mo

rudedogg 4 hours ago | parent | prev [-]

Also iCloud Private Relay.

CloudFlare is making it impossible to browse privately

acdha 3 hours ago | parent [-]

Cloudflare works fine with public relay - they and Fastly provide infrastructure for that service (one half of the blinded pair) so it’s definitely something they test.

loopback_device 2 hours ago | parent | prev | next [-]

Not sure "TLS added and removed here :)" as a Service is the right tool in the drawer for this.

atomic128 4 hours ago | parent | prev | next [-]

Poison Fountain: https://news.ycombinator.com/item?id=46577464

m463 3 hours ago | parent | prev | next [-]

cloudflare also blocks my human-is-driving browser all the time

"enahble javascript and cookies to continue"

also unsupported browser

themafia 4 hours ago | parent | prev | next [-]

The solution, as always, is noise.

RobotToaster 4 hours ago | parent | prev | next [-]

Except for the scrapers that pay cloudflare to exempt them.

ranger_danger 4 hours ago | parent | prev | next [-]

Modern scrapers are using headless chromium which will not see the invisible links, so I'm not sure how long this will be effective.

yakattak 5 hours ago | parent | prev | next [-]

Do you have a link to that?

SchemaLoad 5 hours ago | parent [-]

https://blog.cloudflare.com/ai-labyrinth/

inferiorhuman 4 hours ago | parent | prev [-]

Which is still a far worse experience than if Cloudflare's services weren't needed.