Remix.run Logo
simonw 15 hours ago

I would love to understand this.

Just a few years ago badly behaved scrapers were rare enough not to be worth worrying about. Today they are such a menace that hooking any dynamic site up to a pay-to-scale hosting platform like Vercel or Cloud Run can trigger terrifying bills on very short notice.

"It's for AI" feels like lazy reasoning for me... but what IS it for?

One guess: maybe there's enough of a market now for buying freshly updated scrapes of the web that it's worth a bunch of chancers running a scrape. But who are the customers?

SCHiM 4 hours ago | parent | next [-]

The bar to ingest unstructured data into something usable was lowered, causing more people to start doing it.

Used to be you needed to implement some papers to do sentiment analysis. Reasonably high bar to entry. Now anyone can do it, the result: more people doing scraping (in less competent scrapers too).

devsda 15 hours ago | parent | prev [-]

For whatever reason, legislation is lax right now if you claim the purpose of scraping is for AI training even for copyrighted material.

May be everyone is trying to take advantage of the situation before law eventually catches up.

Imustaskforhelp 4 hours ago | parent [-]

> For whatever reason, legislation is lax right now if you claim the purpose of scraping is for AI training even for copyrighted material

I think the reason is that America & China for the most part are also in AI arms race combined with an AI bubble and neither side would wish to lose literally any percieved advantage to them no matter the cost on others.

Also there is an immense lobbying effort against senators who propose for a stricter AI regulation.

https://www.youtube.com/watch?v=DUfSl2fZ_E8 [What OpenAI doesn't want you to know]

It's actually a great watch. Highly recommended because a lot of talks about regulations does feel to me as mirrors and smoke.