| ▲ | vivzkestrel 3 days ago | |
- i saw your other comment that talks about using an open source dataset but i had to ask - how would you actually go about loading reviews if you really wanted to - what kind of system would you need to work around the captcha and stuff | ||
| ▲ | jmp1062 3 days ago | parent [-] | |
i would probably use Playwright with custom code, create chunks based on similar products, then run it on a large cluster in parallel (https://github.com/Burla-Cloud/burla). if you have a single worker trying to scrape a shit ton of products back to back to back you're going to get rate limited or their bot detection will catch you. | ||