Remix.run Logo
pards 7 hours ago

>> But because it can also be used to bypass paywalls

> How? Does the site pay for subscription for every newspaper?

Someone with a subscription logs into the site, then archives it. Archive.is uses the current user's session and can therefore see the paywalled content.

mojosam 6 hours ago | parent | next [-]

> Someone with a subscription logs into the site, then archives it.

That’s not the case. I don’t have a NYT subscription, I just Googled for an old obscure article from 1989 on pork bellies I thought would be unlikely for archive.today to have cached, and sure enough when I asked to retrieve that article, it didn’t have it and began the caching process. A few minutes later, it came up with the webpage, which if you visit on archive.is, you can see it was first cached just a few minutes ago.

https://www.nytimes.com/1989/11/01/business/futures-options-...

My assumption has been that the NYT is letting them around the paywall, much like the unrelated Wayback Machine. How else could this be working? Only way I could think it could work is that either they have access to a NYT account and are caching using that — something I suspect the NYT would notice and shutdown — or there is a documented hole in the paywall they are exploiting (but not the Wayback Machine, since the caching process shows they are pulling direct from the NYT).

codedokode 7 hours ago | parent | prev | next [-]

Do they have such an option? I don't see it on the site, and the browser extension seems to send only the URL [1] to the server. Can you provide more information?

[1] https://github.com/JNavas2/Archive-Page/blob/main/Firefox/ba...

madeforhnyo 5 hours ago | parent | prev | next [-]

I believe news sites let crawlers access the full articles for a short period of time, so that they appear in search results. Archive.is crawls during that short window.

rkagerer 7 hours ago | parent | prev [-]

Does it still leak your IP, e.g. if the page rendered by the site you're archiving includes it? You'd think they'd create a simple filter to redact that out.

itopaloglu83 7 hours ago | parent [-]

I’m not advocating for it but;

Websites like newspapers might soon put indicator words on the page, not just simple subscriber numbers that can be replaced, to show who is viewing the page which would make it way to archives.