Remix clone Hacker News

new | show | ask | jobs Github

	▲	gamesieve 3 hours ago
		I suspect the redundant requests are primarily designed to weed out poisoned data served on otherwise valid URLs. I've also seen the redundant requests increase massively the more sources I blocked at the firewall level, so it feels like they're pre-emptively overcompensating for some percentage of requests being blocked. My website contains ~6000 unique data points in effectively infinite combinations on effectively infinite pages. Some of those combinations are useful for humans, but the AI-scrapers could gain a near-infinite efficiency improvement by just identifying as a bot and heeding my robots.txt and/or rel="nofollow" hints to access the ~500 top level pages which contain close to everything which is unique. They just don't care. All their efficiency attempts are directed solely toward bypassing blocks. (Today I saw them varying the numbers in their user agent strings: X15 rather than X11, Chrome/532 rather than Chrome/132, and so on...)