| ▲ | hu3 22 minutes ago | |
> The system ingests raw newspaper scans and uses a multi-step LLM pipeline to generate the daily edition This is neat! But I wonder about longevity of the project if it relies on scanning newspapers. Do you have an endless suply? Perhaps there is some digital archive you could use? | ||
| ▲ | culi 9 minutes ago | parent | next [-] | |
My Wikipedia Library membership gives me access to some cool resources that might be of interest: - Arcanum is the largest and continuously expanding digital periodical database from Eastern Europe, which contains scientific and specialized journals, encyclopaedias, weekly and daily newspapers and more - NewspaperARCHIVE.com is an online database of digitized newspapers, with over 2 billion news articles; coverage extends from 1607 to the present from US, Canada, the UK, and 20 other countries. - Newspapers.com includes more than 800 million pages from 20,000+ newspapers. The collection includes some major newspapers for limited periods (e.g., first 72 years of the New York Times), but mostly consists of US regional papers from the 1700s to the late 1980s. Free accounts through the Wikipedia Library include access to Newspapers.com Publisher Extra content. - ProQuest is a multidisciplinary research provider. This access includes ProQuest Central, which includes a large collection of journals and newspapers, Literature Online, the HNP Chinese Newspaper Collections, and the Historical New York Times. - Wikilala is a digital repository consisting of more than 109,000 documents in printed form, including 45,000 newspapers, 32,000 journals, 4,000 books and 26,000 articles concerning the history of the Ottoman Empire from its founding to the modern times. Also most newspapers maintain their own archives, usually accessible online. Here's some I get access to: The Corriere della Sera (one of Italy's oldest and most read newspapers); The Corriere della Sera (a century of historical archives); The Times of Malta (Founded in 1935, it is the oldest daily newspaper still in circulation in Malta); ZEIT ONLINE (online version of Die Zeit, a German weekly newspaper) — and quite a few more | ||
| ▲ | andix 6 minutes ago | parent | prev [-] | |
Copyright issues will stop this soon. 40 year old newspaper articles aren't public domain yet in most countries. A gray area could be a newspaper that went out of business decades ago. Or maybe some government run newspaper that was public domain in the first place. | ||