Remix.run Logo
schoen 2 days ago

Is there any way to find patterns in what doesn't make it into Common Crawl, and perhaps help them become more comprehensive?

Hopefully it's not people intentionally allowing the Google crawler and intentionally excluding Common Crawl with robots.txt?