Remix.run Logo
wraptile 3 days ago

> fortunately it is now easier than ever to do small-scale scraping, the kind yt-dlp does.

this is absolutely not the case. I've been web scraping since 00s and you could just curl any html or selenium the browser for simple automation but now it's incredibly complex and expensive even with modern tools like playwright and all of the monthly "undetectable" flavors of it. Headless browsers are laughably easy to detect because they leak the fact they are being automated and that they are headless. Not to even mention all of the fingerprinting.

sharpshadow 2 days ago | parent | next [-]

> modern browser scripting is extremely easy, so you can often scrape things with mild single-person effort even if the site is an SPA with tons of dynamic JS.

I think he means the JS part is now easy to run and scrape compared to the transition time from basic download scraping to JS execution/headless browser scraping. It is more complex but the tools haven’t been as evolved as they are now a couple of years ago.

2 days ago | parent | prev | next [-]
[deleted]
immibis 2 days ago | parent | prev | next [-]

mozilla-unified/dom/base/Navigator.cpp - find Navigator::Webdriver and make it always return false, then recompile.

johnisgood 3 days ago | parent | prev [-]

+1

I made a web scraper in Perl a few years ago. It no longer works because I need a headless browser now or whatever it is called these days.

Web scraping is MUCH WORSE TODAY[1].

[1] I am not yelling, just emphasizing. :)