| ▲ | elzbardico 2 hours ago | ||||||||||||||||
> which is, of course, ridiculous. Why? in the world of web scrapping this is pretty common. | |||||||||||||||||
| ▲ | xurukefi 2 hours ago | parent [-] | ||||||||||||||||
Because it works too reliably. Imagine what that would entail. Managing thousands of accounts. You would need to ensure to strip the account details form archived peages perfectly. Every time the website changes its code even slightly you are at risk of losing one of your accounts. It would constantly break and would be an absolute nightmare to maintain. I've personally never encountered such a failure on a paywalled news article. archive.today managed to give me a non-paywalled clean version every single time. Maybe they use accounts for some special sites. But there is definetly some automated generic magic happening that manages to bypass paywalls of news outlets. Probably something Googlebot related, because those websites usually give Google their news pages without a paywall, probably for SEO reasons. | |||||||||||||||||
| |||||||||||||||||