Remix.run Logo
everforward 3 hours ago

I think it’s a) volume of scrapers, and b) desire for _all_ content instead of particular content, and c) the scrapers are new and don’t have the decades of patches Googlebot et al do.

5 years ago there were few people with an active interest in scraping ForgeJo instances and personal blogs. Now there are a bajillion companies and individuals getting data to train a model or throw in RAG or whatever.

Having a better scraper means more data, which means a better model (handwavily) so it’s a competitive advantage. And writing a good, well-behaved distributed scraper is non-trivial.