Remix.run Logo
jayelbe 7 hours ago

As somebody who works in local government IT, consistent scraping of our data like this is the bane of our life. We get hit by thousands of these, many with no rate limiting, making hugely intensive requests, that cause downtime and knock-on effects for actual customers and citizens. We block IPs, add captchas, and yet it persists.

If you really want the data, just FOI it for goodness' sake.

I get the distinct impression that many of these outfits aren't really advocating for impoved transparency but are simply trying to exploit and monetise illicitly obtained government data to make a quick buck.

mebkorea 7 hours ago | parent | next [-]

Fair points and yeah you must be sick of unrate-limited mass scraping. I run with 1.5-3 second delays from a single residential IP and back off when portals push back, but from your side I look the same as someone hammering you. On your point regarding FOI, what you say is fair. I should probably have led with that for the trickier councils. But the honest reason I haven't is doing 240 FOI requests at scale felt like it'd put a different kind of strain on councils, but if you're telling me the scraping is worse then I take that seriously. On "monetise illicitly obtained data"... I'm not going to pretend the £19 is altruism. But there is a public interest in this data being navigable across council boundaries, and that's not something individual councils can deliver. I must stress that I'm not sure I've got the model right yet and a lot of feedback today is pushing me toward more free, which I'm seriously considering.

4 hours ago | parent | prev | next [-]
[deleted]
sublinear 7 hours ago | parent | prev [-]

Maybe I'm just naive, but why wouldn't a citizen do both?

I'm not implying that anything would get deliberately redacted, but it seems likely that information released through other channels would not match the web. A request might also reveal information that was not on the web.

What other choices are there?