Remix.run Logo
nikanj 9 hours ago

archive.is is frequently used to bypass paywalls, I wonder if this is motivated by that somehow

nacozarina 8 hours ago | parent | next [-]

DOGE is busy replacing official US govt websites, and does not want anyone bringing up the past.

jdiff 8 hours ago | parent [-]

If they were any good at it, they'd have blocked the Internet Archive via robots.txt. For some inexplicable reason, IA responds to that by wiping out past, present, and future archivals of that site. They haven't taken that easy step, so I doubt they'd go the further, more involved step of focusing on this smaller actor.

codedokode 7 hours ago | parent | next [-]

IA also blocks some content in Russia, for example this [1] says: "This URL has been excluded from the Wayback Machine in your region.". I was sincerely surprised to learn that while not paying much attention to US copyright law, they have high respect for messages from Vladimir.

(in case someone is curious what about is that article, it is a fictional comparison of life of a fictional character in Springfield, USA and Chusovoy, USSR in 80s and I cannot even understand why it was banned in Russia)

[1] https://web.archive.org/web/20250418160713/https://habr.com/...

wl 7 hours ago | parent | prev [-]

The Wayback Machine has ignored robots.txt for a few years at this point. The only way to get them to stop scraping or remove content is by asking them directly.

pogue 9 hours ago | parent | prev | next [-]

That's most likely the reason pressure is being put on them. Big media companies successfully shutdown 12ft.io, which was used to bypass paywalls, and forced the BPC (Bypass Paywalls Chrome) browser extension off the Mozilla Extension store, then Gitlab, then Github. Now the dev is hosting it on a Russian Github clone, presumably making it untouchable.

Since archive[.]today is using some very obscure hosting methods with multiple international mirrors, it makes it incredibly difficult for law enforcement to go after.

pimlottc 8 hours ago | parent [-]

What obscure methods are they using?

pogue 8 hours ago | parent [-]

I guess it might fall under a bulletproof hosting type of setup. [1] There have been many people investigating to try and figure out who owns & operates who is actually behind archive[.]today and how they're continuously able to bypass the paywalls of paid sites, continue operating with such large infrastructure with no apparent income source.

There was quite a good article posted here on HN about someone trying to figure out those questions, but I can't seem to find it.

[1] https://en.wikipedia.org/wiki/Bulletproof_hosting

stef25 7 hours ago | parent [-]

Isn't it just a question of pretending to be a search bot ? Sites will allow google bot to bypass the paywall so stuff gets indexed.

input_sh 6 hours ago | parent [-]

You could easily test your hypothesis yourself. It's not gonna work very well.

mattmaroon 9 hours ago | parent | prev [-]

100%. It's like Lenin said, you look for the person who will benefit… and, uh, uh, you know… You know, you'll uh, uh—well, you know what I'm trying to say…

pimlottc 8 hours ago | parent | next [-]

I’m not sure what you’re referencing, but the principal goes back way back to the Romans: Cui bono? [0]

0: https://en.wikipedia.org/wiki/Cui_bono%3F

trevithick 8 hours ago | parent [-]

They're referencing this: https://m.youtube.com/watch?v=HlZhPuDYqbU

_blk 6 hours ago | parent | prev [-]

It's a valid way to look for probable cause but it's important differ "you know" and "you assume" - I'm all for accountability but most conspiracy theories thrive exactly because of that sort of framing.

As to Lenin: The mouse died because it didn't understand why the cheese was free