Remix.run Logo
veunes 16 hours ago

So basically Common Crawl is a data laundromat for Big Tech. They outsource their dirty and ethically questionable data collection to a "non-profit," and then act like they're just "researchers" using an "open" dataset. Those "donations" from OpenAI and Anthropic are just payment for plausible deniability