Remix clone Hacker News

new | show | ask | jobs Github

	▲	everforward 3 hours ago
		I think it’s a) volume of scrapers, and b) desire for _all_ content instead of particular content, and c) the scrapers are new and don’t have the decades of patches Googlebot et al do. 5 years ago there were few people with an active interest in scraping ForgeJo instances and personal blogs. Now there are a bajillion companies and individuals getting data to train a model or throw in RAG or whatever. Having a better scraper means more data, which means a better model (handwavily) so it’s a competitive advantage. And writing a good, well-behaved distributed scraper is non-trivial.