Remix.run Logo
safehuss 8 hours ago

This is awesome! Worked on something similar albeit a different industry.

For the more challenging scrapes, would highly recommend using the Chrome Devtools MCP to be able to attach the network requests, being made by the browser to the site, as context for your agent/LLM chat - this approach really helped me to write a solid API-based scraper (also using curl_cffi) and bypassed the old tedious playwright-based approach I used to rely on.

mebkorea 7 hours ago | parent [-]

Nice thinking. Hadn't thought of DevTools MCP that way. Curl_cffi I've used for TLS fingerprinting (Edinburgh was the first one) but the discovery side I've been doing manually... open DevTools, look at the request, copy as cURL, work out which params can be pruned. Automating that loop with an LLM in the middle would speed things up a lot, especially for the bespoke long tail. Will look into that this week. Thanks!