Remix.run Logo
renewiltord 9 hours ago

I've tried to explain this to people repeatedly and they don't get it. They're always like "oh no the AI scraper is slamming my website it's ruining everything". Um, maybe configure your web browser to not send me data if you don't want me 'scraping' your website. It's literally your server's choice to send me data. I'm just asking from a few IPs. If you want to send data to all of them that's your server's choice.

But I think people don't get the fact that they can just request payment or only send to authenticated users from authorized IPs and so on. Instead they want to send to all IPs without payment but then get upset when I use a bunch of IPs without paying. Weird.

I'm trying to read a bunch of stuff. The entire point of a computer is to make that easy. I'm not going to repetitively click through a bunch of links when a bot can do that way faster.

gusgus01 8 hours ago | parent [-]

And what is the surefire way to stop AI scrapers from accessing your website? If there is no way, how can this be an acceptable ask?

It already sounds like you're using several IPs to access sites, which seems like a work around to someone somewhere trying to limit the use of one IP (or just lack of desire to host and distribute the data yourself to your various hosts).

Just because you can do something doesn't mean everyone must accept and like that you are doing that thing.

renewiltord 8 hours ago | parent [-]

The answer is right there: use authentication with cost per load, or an IP whitelist.

GP is absolutely right. If your server is just going to send me traffic when I ask I’m just going to ask and do what I want with the response.

Your server will respond fine if I click through with different IPs and it’s just a menial task to have this distribution of requests to IPs, which is what we made computers for.

Yeah, you’re right of course that no one has to like the “piracy” or “scraping” or whatever other name you’re giving to a completely normal request-response interaction between machines. They can complain. And I can say they’re silly for complaining. No one has to like anything. Heck you could hate ice cream.

gusgus01 5 hours ago | parent [-]

As long as we all understand that this mentality is advocating for the end of an open internet. This is the tragedy of the commons in action, the removal of a common good because the few that would take advantage of it do. Just because something is programmed to be a request and response interaction (although the use of blocklists and robots.txt and etc should reveal that it's not a simple request and response interaction), does not mean we should have to go all or nothing in ensuring it's not abused. We are still the operators of programs, it's still a social contract. If I block an IP and the same operator shows up with a different IP, it's like if I got kicked out of a bar and then came back with a fake mustache on and got confused why they think it's wrong because they don't have a members list.

A personal website is like a community cupboard or an open access water tap, people put it out there for others to enjoy but when the reseller shows up and takes it all it's no longer sustainable to provide the service.

Of course, it's all a spectrum: from monster corporations that build in the loss to their projections and participate in wholesale data collection and selling to open websites with no ads or limited ads as a sort of donation box; from a person using css/js to block ads or software to pirate for cheaper entertainment to an AI scrapper using swathes of IPs and servers to non-stop request all the data you're hosting for their own monetary gain. I have different opinions depending on where on the spectrum you are. But I do think piracy and ad blocking are on the same spectrum, and much closer to acceptable than mass AI scraping.

These responses were more about your comments about AI scraping then the piracy vs ad blocking conversation, but in my opinion the gap between them and scraping is quite large.

renewiltord 2 hours ago | parent [-]

Everyone thinks that their specific pet thing is the precious commons and the other guy is the abuser. But in any case, one should be able to follow the reasoning.

If blocking ads is permissible because the server cannot control the client but can control itself; then so is “scraping”. Both services ask of their clients something they cannot enforce. And both find that the clients refuse.

If you find the justification valid but decide that the conclusion is nonetheless absurd, you must find which step in the reasoning has a failure. The temptation is epicyclic: corporations vs humans or something of the sort; commercial vs non-commercial.

But on its own there is no justification. It’s just that your principles lead you to absurdity but you refuse to revisit them because you like taking from others but you don’t like when others take from you. A fairly simple answer. Nothing for Occam’s Razor to divide.

Particularly believable because the arrival of AI models trained on the world seems to have coincided with some kind of copyright maximalism that this forum has never seen before. Were the advocates of the RIAA simply not users yet?

Or, more believably, is it just that taking feels good but being taken from feels bad?