| ▲ | logicallee 9 hours ago |
| Thanks so much for sharing this. It looks fantastic. A couple of questions, if you don't mind: what license are you releasing this under, if any? Is there any way to download it? The reason someone might want to download it is for use as training data. |
|
| ▲ | zozbot234 8 hours ago | parent | next [-] |
| Wikisource has the original scans available in the public domain, and their enriched text under CC-BY-SA: https://en.wikisource.org/wiki/EB1911 |
|
| ▲ | ahaspel 9 hours ago | parent | prev | next [-] |
| Thanks! The underlying text (1911 edition) is public domain, but the structured version here — the parsing, reconstruction, and linking — is something I put together for this site. Right now there isn’t a bulk download available. I’m considering exposing structured access (API or dataset) in some form, but haven’t decided exactly how that will work yet. If you have a specific use case in mind (especially for training), I’d be interested to hear more. |
| |
| ▲ | hallole 8 hours ago | parent | next [-] | | I've wanted to do something like this for The Encyclopédie, a hugely relevant text to the Enlightenment. If you ever get around to adding a rough "How I (generally) Made This" section, that'd be appreciated! Site looks great :) | |
| ▲ | logicallee 9 hours ago | parent | prev [-] | | Regarding the specific use case, I was thinking this: I had Gemma 4 (a small but highly capable offline model released by Google) make a public domain cc0 encyclopedia of some core science and technology concepts[1]. I thought it was pretty good. Separately, I've fine-tuned the Gemma 4 model[2], it was very quick (just 90 seconds), so I think it could be interesting to train it to talk like 1911 Encyclopedia Britannica. I would use the entries as training data and train it to talk in the same style. There isn't a specific use case for why, I just think it would be interesting. For example, I could see how it writes about modern concepts in the style of 1911 Britannica. [1] https://stateofutopia.com/encyclopedia/ [2] To talk like a pirate! https://www.youtube.com/live/WuCxWJhrkIM | | |
| ▲ | ahaspel 8 hours ago | parent [-] | | That’s a fun idea — I can see the appeal of that style. The underlying text is public domain, but the structured version here is something I put together for the site. I haven’t released a bulk dataset yet. If you end up experimenting with it, I’d love to hear how it turns out — and I’m still figuring out what structured access might look like. |
|
|
|
| ▲ | realityfactchex 9 hours ago | parent | prev [-] |
| > Is there any way to download it? The reason someone might want to download it is for use as training data. Another reason would be to able to keep running/using it even if the main site were to go down for whatever reason eventually; or, to operate a mirror of it, for redundancy (linking back to the original, of course). |