Remix.run Logo
nonethewiser 4 hours ago

>We should give up with the idea of databases which are 'open' to the public, but you have to pay to access, reproduction isn't allowed, records cost pounds per page, and bulk scraping is denied. That isn't open.

How about rate limited?

londons_explore 4 hours ago | parent | next [-]

No. Open is open. Beyond DDoS protections, there should be no limits.

If load on the server is a concern, make the whole database available as a torrent. People who run scrapers tend to prefer that anyway.

This isn't someone's hobby project run from a $5 VPS - they can afford to serve 10k qps of readonly data if needed, and it would cost far less than the salary of 1 staff member.

tchalla 3 hours ago | parent | next [-]

> Open is open.

I’d then ask OpenAI to be open too since open is open.

delichon 3 hours ago | parent | prev | next [-]

Rate limiting is a DDoS protection.

wang_li an hour ago | parent | prev | next [-]

You're talking about a tragedy of the commons situation. There is an organic query rate of this based on the amount of public interest. Then there is the inorganic vacuuming of the entire dataset by someone who wants to exploit public services for private profit. There is zero reason why the public should socialize the cost of serving the excess capacity caused by private parties looking to profit from the public data.

I could have my mind changed if the public policy is that any public data ingested into an AI system makes that AI system permanently free to use at any degree of load. If a company thinks that they should be able to put any load they want on public services for free, they should be willing to provide public services at any load for free.

Gud 2 hours ago | parent | prev [-]

The world is not black and white.

alberto467 3 hours ago | parent | prev | next [-]

The issue with that is people can then flood everything with huge piles of documents, which is bad enough if it's all clean OCR'd digital data that you can quickly download in its entirety, but if you're stuck having to wait between downloading documents, you'll never find out what they don't want you to find out.

It's like having you search through sand, it's bad enough while you can use a sift, but then they tell you that you can only use your bare hands, and your search efforts are made useless.

This is not a new tactic btw and pretty relevant to recent events...

BillinghamJ 3 hours ago | parent | prev | next [-]

Systems running core government functions should be set up to be able to efficiently execute their functions at scale, so I'd say it should only restrict extreme load, ie DoS attacks

hyperpape 4 hours ago | parent | prev [-]

If the rate limit is reasonable (allows full download of the entire set of data within a feasible time-frame), that could be acceptable. Otherwise, no.