Remix.run Logo
re-lre-l 7 hours ago

Don’t get me wrong, but what’s the problem with scrapers? People invest in SEO to become more visible, yet at the same time they fight against “scraper bots.” I’ve always thought the whole point of publicly available information is to be visible. If you want to make money, just put it behind a paywall. Isn’t that the idea?

georgefrowny 7 hours ago | parent | next [-]

There's a difference between putting information easily online for your customers or even people in general (eg as a hobby), and working in concert with scraping for greater visibility via search, and giving that work away, or at a cost, to companies who at best don't care and possibly may be competition, see themselves as replacing you or otherwise adversarial.

The line is "I technically and able to do this" and "I am engaging with a system in good faith".

Public parks are just there and I can technically drive up and dump rubbish there and if they didn't want me to they should have installed a gate and sold tickets.

Many scrapers these days are sort of equivalent in that analogy to people starting entire fleets of waste disposal vehicles that all drive to parks to unload, putting strain on park operations and making the parks a less tenable service in general.

nrhrjrjrjtntbt 7 hours ago | parent | prev | next [-]

The old scrapers indexed your site so you may get traffic. This benefits you.

AI scrapers will plagiarise your work and bring you zero traffic.

ProofHouse 6 hours ago | parent [-]

Ya make sure you hold dear that grain of sand on a beach of pre-training data that is used to slightly adjust some embedding weights

jcynix 5 hours ago | parent | next [-]

Sand is the world's second most used natural resource and sand usable for concrete gets even illegally removed all over the world nowadays.

So to continue your analogy, I made my part of the beach accessible for visitors to enjoy, but certain people think they can carry it away for their own purpose ...

boxedemp 5 hours ago | parent | prev | next [-]

One Reddit post can get an LLM to recommend putting glue in your pizza. But the takeaway here is to cheese the bots.

throwawa14223 3 hours ago | parent | prev | next [-]

I have no reason to help the richest companies on earth adjust weights at a cost to myself.

exe34 5 hours ago | parent | prev [-]

that grain of sand used to bring traffic, now it doesn't. it's pretty much an economic catastrophe for those who relied on it. and it's not free to provide the data to those who will replace you - they abuse your servers while doing it.

saltysalt 5 hours ago | parent | prev | next [-]

You are correct, and the hard reality is that content producers don't get to pick and choose who gets to index their public content because the bad bots don't play by the rules of robots.txt or user-agent strings. In my experience, bad bots do everything they can to identify as regular users: fake IPs, fake agent strings...so it's hard to sort them from regular traffic.

Dilettante_ 6 hours ago | parent | prev [-]

Did you read TFA?

These scrapers drown peoples' servers in requests, taking up literally all the resources and driving up cost.