| ▲ | dogline 7 hours ago | ||||||||||||||||
Also, just to clarify, I scanned all 7488 pages in personally (Fujitsu ScanSnap ix500). With Claude's help, I found some undocumented SANE features to auto crop and fix the scans, then had a Python script in Linux auto scan them and put them into a Postgres database as I went. Other scripts would add transcription, summaries, and auto index everything. "mistral-ocr-latest" did really good handwriting transcription, considering how tight and small some of the handwriting is. Then back to Claude API calls to summarize by month and collect people and places from all of the entires. Claude then created static html pages from what started as a Flask app. Published on Dreamhost. | |||||||||||||||||
| ▲ | zzleeper 3 minutes ago | parent | next [-] | ||||||||||||||||
That's amazing! I'm working on a kinda similar project (documenting bank runs from historical newspapers) and also opted for Claude to build a static website. Crazy that the two sites have a very similar look and feel: https://www.finhist.com/bank-runs/index.html . The only big difference is that mine lacks a map, which I should hopefully fix soon (I already have lat and lon and am linking to google maps). PS: Do you know if mistral works better at OCRing handwritten text than gemini 3? Was planning on going the gemini3 for another project | |||||||||||||||||
| ▲ | dogline 6 hours ago | parent | prev | next [-] | ||||||||||||||||
Oh boy. #3 on front page, 19k page hits in the first hour. 8243 static html pages, 15728 webp images (10k-50k each). I've never had one of my sites with this much traffic. With everything as static files, website is still holding. Thank you all. | |||||||||||||||||
| |||||||||||||||||
| ▲ | beej71 2 hours ago | parent | prev [-] | ||||||||||||||||
This is great! I love it when people take bits of history that works be forgotten and put them out in the world (to be further vacuumed up by Internet Archive). Thank you for doing it. | |||||||||||||||||