Remix.run Logo
tonymet 2 hours ago

I’m an outsider with experience building crawlers. You can get pretty far with residential proxies and browser fingerprint optimization. Most of the b-tier publishers use RBC and heuristics that can be “worked around” with moderate effort.

quietsegfault 2 hours ago | parent [-]

.. but what about subscription only, paywalled sources?

tonymet an hour ago | parent [-]

many publisher's offer "first one's free".

For those that don't , I would guess archive.today is using malware to piggyback off of subscriptions.