Remix.run Logo
derefr 6 hours ago

I wonder if these publishers would be more amenable to a private archiver that only serves registered academic / journalistic research projects (the way most physical private archives do), with a specific provision to never provide data to companies that would resell it or use it for training of generative models.

eternauta3k 5 hours ago | parent | next [-]

They already have archives with online and printed articles which they license to libraries, because the libraries take care of rate limiting and limiting abuse.

coffeefirst 4 hours ago | parent | prev | next [-]

Yes. Most publishers already do syndication deals. This is a fine idea.

The problem with the LLMs is they capture the value chain and give back nothing. It didn’t have to be this way. It still doesn’t.

ninjagoo 5 hours ago | parent | prev [-]

They probably have internal archives if they're smart; but that isn't accessible to the public. I think the issue isn't whether the data is archived, but whether that information is available to the public for the foreseeable future.

g-b-r 5 hours ago | parent [-]

They sure have archives of the newspapers, they're much less likely to have archives of what they publish online.

And a local archive is one fire, business decision, poor technical choice etc away from getting permanently lost