| ▲ | jmp1062 3 days ago | |
i would probably use Playwright with custom code, create chunks based on similar products, then run it on a large cluster in parallel (https://github.com/Burla-Cloud/burla). if you have a single worker trying to scrape a shit ton of products back to back to back you're going to get rate limited or their bot detection will catch you. | ||