| ▲ | giancarlostoro 6 hours ago | |||||||
Friendly reminder that archive box exists to let you self host your own archive service. https://github.com/ArchiveBox/ArchiveBox I dream of a day where archivebox becomes a fleet of homelabs all over the world making it drastically harder to block them all. | ||||||||
| ▲ | nikisweeting 29 minutes ago | parent | next [-] | |||||||
I've been mulling over how to take ArchiveBox in this direction for years, but it's a really hard problem to tackle because of privacy. https://docs.sweeting.me/s/cookie-dilemma Most content is going behind logins these days, and if you include the PII of the person doing the archiving in the archives then it's A. really easy for providers to block that account B. potentially dangerous to dox the person doing the archiving. The problem is removing PII from logged in sites is that it's not as simple as stripping some EXIF data, the html and JS is littered with secret tokens, usernames, user-specific notifications, etc. that would reveal the ID of the archivist and cant be removed without breaking page behavior on replay. My latest progress is that it might be possible to anonymize logged in snapshots by using the intersection of two different logged-in snapshots, making them easier to share over a distributed system like Bittorrent or IPFS without doxxing the archivist. More here: https://github.com/pirate/html-private-set-intersection | ||||||||
| ▲ | e2le 22 minutes ago | parent | prev | next [-] | |||||||
Out of curiosity, does ArchiveBox integrate some way of verifying the contents of the archived page(s) are legitimate and unmodified? | ||||||||
| ||||||||
| ▲ | codedokode 6 hours ago | parent | prev [-] | |||||||
I think about the opposite, people reading in the news that FBI is after archiving sites, will not want to launch their own site, except maybe the radical types. | ||||||||
| ||||||||