▲ | pkamb 5 days ago | |||||||
Does Anna's Archive or a similar site host, say, the complete New York Times (pre-1930) as a full PDF download set? And every other newspaper too? Tons of public domain sources are locked into websites like Newspapers.com or the nearly-dead and now completely unsearchable old Google News / Newspaper. It would be nice if the massive pursuit of AI training data resulted in some fully-legal open source alternatives to these proprietary, outdated, or abandoned sites. I know some of it is available via the Internet Archive, etc., but something new with an AI-powered search and finding aid sounds so useful. | ||||||||
▲ | lioeters 5 days ago | parent [-] | |||||||
> complete New York Times (pre-1930) https://archive.org/search?query=title%3ANew+York+Times&sort... > as a full PDF download set I imagine it's possible to achieve this through torrents from Anna's, but you'd have to search and compile the list of all individual PDFs. > something new with an AI-powered search With enough time and willingness, someone could put all the old NYT issues through optical character recognition and convert them to text; then make it available to large language models for semantic search of some kind. Ideally public cultural funds could support the effort as academic research. | ||||||||
|