Remix.run Logo
nostrademons 4 hours ago

That's sorta what MetaBrainz did - they offer their whole DB as a single tarball dump, much like what Wikipedia does. I downloaded it in the order of an hour; if I need a MusicBrainz lookup, I just do a local query.

For this strategy to work, people need to actually use the DB dumps instead of just defaulting to scraping. Unfortunately scraping is trivially easy, particularly now that AI code assistants can write a working scraper in ~5-10 minutes.

8note 8 minutes ago | parent | next [-]

the obvious thing would be to take down their website and only have the DB dump.

if thats the useful thing, it doesnt need the wrapper

tonyhart7 2 hours ago | parent | prev [-]

I mean this AI data scrapper would need to scan and fetch billions of website

why would they even care over 1 single website ??? You expect instiution to care out of billions website they must scrape daily

what an hour ago | parent [-]

This is probably the reason. It’s more effort to special case every site that offers dumps than to just unleash your generic scraper on it.