| ▲ | andrew_zhong 3 hours ago | ||||||||||||||||
Good point. The anti-bot patches here (via Patchright) are about preventing the browser from being detected as automated — things like CDP leak fixes so Cloudflare doesn't block you mid-session. It's not about bypassing access restrictions. Our main use case is retail price monitoring — comparing publicly listed product prices across e-commerce sites, which is pretty standard in the industry. But fair point, we should make that clearer in the README. | |||||||||||||||||
| ▲ | plastic041 2 hours ago | parent | next [-] | ||||||||||||||||
robots.txt is the most basic access restrictions and it doesn't even read it, while faking itself as human[0]. It is about bypassing access restrictions. [0]: https://github.com/lightfeed/extractor/blob/d11060269e65459e... | |||||||||||||||||
| ▲ | zendist 2 hours ago | parent | prev | next [-] | ||||||||||||||||
Regardless. You should still respect robots.txt.. | |||||||||||||||||
| |||||||||||||||||
| ▲ | messe 3 hours ago | parent | prev [-] | ||||||||||||||||
> It's not about bypassing access restrictions. Yes. It is. You've just made an arbitrary choice not to define it as such. | |||||||||||||||||
| |||||||||||||||||