Remix.run Logo
beatthatflight 9 hours ago

Worth trying claude/gemini to see if they'll do some scraping for you. I've found some paywall sites only too happy to allow Gemini past the wall.

mebkorea 9 hours ago | parent [-]

Hadn't thought of that tbh. Worth a go on Liverpool especially... that's the AWS WAF one I'm currently blocked on and it is doing my head in. The challenge there is volume rather than access (~80k decisions to backfill), so even if an LLM gets through the wall I'd still need to script around it. But could be a way in for the initial cookie. Cheers for the tip and will look into it.

mobilio 5 hours ago | parent [-]

can you share link to Liverpool so i can try it too please?

mebkorea 3 hours ago | parent [-]

Sure, here you go: https://lar.liverpool.gov.uk/planning/index.html?fa=search

Heads up tho... it's behind AWS WAF with a JS challenge. Solving the challenge once works fine, but the WAF rate-limits the IP after ~10 requests and blocks for the rest of the day. So getting a session is doable, getting through 80,000 decisions is the hard bit. If you crack it I want to know! Cheers.