Remix.run Logo
input_sh 9 hours ago

> How come even small sites get hammered constantly?

Because big sites have decades of experience fighting against scrapers and have recently upped their game significantly (even when doing so carries some SEO costs) so that they're the only ones that can train AI on their own data.

So now, when you're starting from scratch and your goal is to gather as much data as possible, targetting smaller sites with weak / non-existent scraping protection is the path of least resistence.

andai 6 hours ago | parent [-]

No I meant like, if you have a blog with 10 posts.. do they just scrape the same 10 pages thousands of times?

Because people are reporting constant traffic, which would imply that the site is being scraped millions of times per year. How does that make any sense? Are there millions of AI companies?

marcthe12 5 hours ago | parent [-]

Basically the scrappers do not bother to cache your website or if they do, with an insanely low ttl. Also they do not specialize the content. So the worst hit sites are something like git hosting due the bfs style scrape (every link). The worst part is alot of this is done via tunneling so ip can be different each time or from residential ops. Which makes it annoying.