Remix.run Logo
Ronsenshi 8 hours ago

One thing about Google is that many anti-scraping services explicitly allow access to Google and maybe couple of other search engines. Everybody else gets to enjoy CloudFlare captcha, even when doing crawling at reasonable speeds.

Rules For Thee but Not for Me

chii 8 hours ago | parent | next [-]

> many anti-scraping services explicitly allow access to Google and maybe couple of other search engines.

because google (and the couple of other search engines) provide enough value that offset the crawler's resource consumption.

JasonADrury 5 hours ago | parent [-]

That's cool, but it's impossible for anyone to ever build a competitor that'd replace google without bypassing such services.

ehhthing 7 hours ago | parent | prev | next [-]

You say this like robots.txt doesn't exist.

toofy an hour ago | parent | next [-]

it almost sounds like they’re saying the contents of robots.txt shouldn’t matter… because google exists? or something?

implying “robots.txt explicitly says i can’t scrape their site, well i want that data, so im directing my bot to take it anyway.”

sitzkrieg 9 minutes ago | parent | prev [-]

so many things flat out ignore it in 2026 let's be real

ErroneousBosh an hour ago | parent | prev [-]

Why are you scraping sites in the first place? What legitimate reason is there for you doing that?