Remix.run Logo
ashish-alex 9 hours ago

Working on similar problem in another domain. I found agentic direction powerful with browser use plugged into a multimodal (strong agentic capability) llm like gpt 5.4 mini working in a loop with orchestrator evaluator/judge.

mebkorea 9 hours ago | parent [-]

Nice! Yeah, I went the other way... deterministic scrapers per portal type because once you've worked out the search form quirks for an Idox or Northgate or Ocellaweb, it's the same shape across every council using that platform. So the marginal cost of adding council N is config not code. The agentic approach gets more interesting for the long tail though — the bespoke ASP.NET ones where every council is its own snowflake... and it is a GRIND honestly. How are you finding the loop on cost vs reliability?

gnfargbl 8 hours ago | parent [-]

Deterministic scrapers are almost certainly the right answer for this task, because once those special snowflakes have paid for their bespoke IT system, they'll never change it.

On the grind, why not get an agent to help you build the long tail of deterministic scrapers? Claude etc is really shockingly good at this kind of moderate-complexity iterative work, it will just keep going around the fetch/parse/understand loop until it has what you're looking for.

mebkorea 8 hours ago | parent [-]

Yeah, that's essentially what I'm doing. Claude handles most of the look at the portal, work out the search form, write the config loop. The actual bottleneck isn't code tbh, it's that every (snowflake) council needs like 30+ minutes of investigation before you can even get going, and a chunk deadend because the portal's broken or migrated. I already hit three this morning. Worcester returns connection refused, Breckland's URL is dead, Rother migrated to a different platform. The grind is "is this portal even alive" more than the scraper itself.