Remix.run Logo
gradientsrneat a day ago

After reading about what Rupert Murdoch did in Australia to try to claw money from search engines for simply indexing pages from news websites, I do understand that it's possible to go too far in favor of the "news" organizations (whether they are reputable or not).

I don't think the LLM companies are fully innocent here, to be fair.

gradientsrneat a day ago | parent [-]

I will add, the news websites didn't start paywalling just because of LLM scrapers. They started doing it after certain parts of the GDPR passed, because they could no longer sustain themselves as much from targeted advertising and data sales. News has supplemented itself with advertising for a long time, but targeted advertising has sadly become perceived as mandatory by advertising companies, even before Google's dominance.

veunes 16 hours ago | parent [-]

So publishers are in a no-win situation: if they lock down their content completely (server-side paywall), they disappear from Google Search and lose traffic. If they keep the "leaky" paywall, their content gets hoovered up for free by Common Crawl to train models that will then compete directly against them. They're trapped