Remix.run Logo
ronsor 3 hours ago

The thing about archives is you either parse them now or parse them later. With how much JS and other crap is served in modern social media frontends, I'm not sure WARC is the best format for archiving from them.

ElectricalUnion 3 hours ago | parent [-]

But that is the point of WARC: otherwise, your archival method need some sort of general inteligence (ai or human behind the scenes) to store exacly what you need.

With WARC (and good WARC tooling like Browsetrix-crawler) you store everything HTTP the site sent.