| ▲ | ahaspel 9 hours ago |
| I rebuilt the 1911 Encyclopædia Britannica into a clean, structured, navigable site: https://britannica11.org/ What it does: – ~37k articles reconstructed from the original volumes
– section-level structure (contents are clickable within articles)
– cross-references extracted and linked
– contributors indexed and searchable
– original volume + page references preserved and shown while reading
– links to the original scans for each page
– ancillary material included (prefaces, abbreviations, etc.)
– topic index reproduced and cross-linked
– full-text search with article metadata (length, volume, etc.) Most of the work was in parsing and reconstruction: headings, multi-page articles, tables, math, languages, footnotes, plates, and all the small edge cases that come up in a work like this. The goal was to make something that feels like the original, but is actually usable. I’d especially appreciate feedback on:
– search quality
– navigation (sections, cross-references)
– anything that looks structurally off Happy to answer questions about the pipeline or data model |
|
| ▲ | zozbot234 7 hours ago | parent | next [-] |
| You might want to add The Reader's Guide to the Encyclopaedia Britannica, PD text available at https://www.gutenberg.org/ebooks/74039 and scans at https://archive.org/details/readersguidetoen00londuoft - It would fit naturally with the Ancillary material that includes the topic-based index. |
| |
| ▲ | ahaspel 7 hours ago | parent [-] | | It would indeed. I will see about working this in, it's highly pertinent. |
|
|
| ▲ | nyc_pizzadev 6 hours ago | parent | prev | next [-] |
| Very nice. I actually spent a bit of time browsing a few topics, which is something I rarely do these days! A few things... when I click an article and try to jump to a new topic, the top search box (labeled "Search titles and full text...") doesn't work. Second, when I first came to the site, I was a bit stuck. It took a bit of time to realize I need to click on "Articles" or even "Topics" to start browsing. Not sure why, maybe I expected the image to let me enter the site somehow...? |
|
| ▲ | ks2048 4 hours ago | parent | prev | next [-] |
| Nice job. How about wikipedia-style links to other articles for topics mentioned within another article? |
|
| ▲ | logicallee 9 hours ago | parent | prev | next [-] |
| Thanks so much for sharing this. It looks fantastic. A couple of questions, if you don't mind: what license are you releasing this under, if any? Is there any way to download it? The reason someone might want to download it is for use as training data. |
| |
| ▲ | zozbot234 8 hours ago | parent | next [-] | | Wikisource has the original scans available in the public domain, and their enriched text under CC-BY-SA: https://en.wikisource.org/wiki/EB1911 | |
| ▲ | ahaspel 9 hours ago | parent | prev | next [-] | | Thanks! The underlying text (1911 edition) is public domain, but the structured version here — the parsing, reconstruction, and linking — is something I put together for this site. Right now there isn’t a bulk download available. I’m considering exposing structured access (API or dataset) in some form, but haven’t decided exactly how that will work yet. If you have a specific use case in mind (especially for training), I’d be interested to hear more. | | |
| ▲ | hallole 8 hours ago | parent | next [-] | | I've wanted to do something like this for The Encyclopédie, a hugely relevant text to the Enlightenment. If you ever get around to adding a rough "How I (generally) Made This" section, that'd be appreciated! Site looks great :) | |
| ▲ | logicallee 9 hours ago | parent | prev [-] | | Regarding the specific use case, I was thinking this: I had Gemma 4 (a small but highly capable offline model released by Google) make a public domain cc0 encyclopedia of some core science and technology concepts[1]. I thought it was pretty good. Separately, I've fine-tuned the Gemma 4 model[2], it was very quick (just 90 seconds), so I think it could be interesting to train it to talk like 1911 Encyclopedia Britannica. I would use the entries as training data and train it to talk in the same style. There isn't a specific use case for why, I just think it would be interesting. For example, I could see how it writes about modern concepts in the style of 1911 Britannica. [1] https://stateofutopia.com/encyclopedia/ [2] To talk like a pirate! https://www.youtube.com/live/WuCxWJhrkIM | | |
| ▲ | ahaspel 8 hours ago | parent [-] | | That’s a fun idea — I can see the appeal of that style. The underlying text is public domain, but the structured version here is something I put together for the site. I haven’t released a bulk dataset yet. If you end up experimenting with it, I’d love to hear how it turns out — and I’m still figuring out what structured access might look like. |
|
| |
| ▲ | realityfactchex 9 hours ago | parent | prev [-] | | > Is there any way to download it? The reason someone might want to download it is for use as training data. Another reason would be to able to keep running/using it even if the main site were to go down for whatever reason eventually; or, to operate a mirror of it, for redundancy (linking back to the original, of course). |
|
|
| ▲ | gnerd00 9 hours ago | parent | prev | next [-] |
| legal terms question here also -- several major world economies are operating under very different rules regarding datasets and publication rights. I am in the USA / California.. will there be terms for me, given that I am not a giant deep-pockets FAANG, just a book person ? commercial use terms for "small business" scale ? |
| |
| ▲ | ahaspel 9 hours ago | parent | next [-] | | The 1911 text itself is public domain, so anyone is free to use it. What I’ve built here is a structured edition — the parsing, reconstruction, linking, indexing, etc. I haven’t published a formal license for that yet. For casual or small-scale use there’s no issue at all. For bulk use (e.g. dataset / training / redistribution), I’d prefer people get in touch so I can figure out a sensible way to support that. | |
| ▲ | dessimus 5 hours ago | parent | prev | next [-] | | It's been on Project Gutenburg for over 20 years: https://www.gutenberg.org/ebooks/13600 They only release books that are in the public domain. | |
| ▲ | TremendousJudge 9 hours ago | parent | prev [-] | | I guess such an old edition is in the public domain |
|
|
| ▲ | Soluod 6 hours ago | parent | prev [-] |
| [dead] |