Remix.run Logo
vivzkestrel 3 days ago

- i saw your other comment that talks about using an open source dataset but i had to ask

- how would you actually go about loading reviews if you really wanted to

- what kind of system would you need to work around the captcha and stuff

jmp1062 3 days ago | parent [-]

i would probably use Playwright with custom code, create chunks based on similar products, then run it on a large cluster in parallel (https://github.com/Burla-Cloud/burla).

if you have a single worker trying to scrape a shit ton of products back to back to back you're going to get rate limited or their bot detection will catch you.