Remix.run Logo
squigz 4 hours ago

> To archive Metabrainz there is no way but to browse the pages slowly page-by-page. There's no machine-communicable way that suggests an alternative.

Why does there have to be a "machine-communicable way"? If these developers cared about such things they would spend 20 seconds looking at this page. It's literally one of the first links when you Google "metabrainz"

https://metabrainz.org/datasets

what an hour ago | parent [-]

You expect the developers of a crawler to look at every site they crawl and develop a specialized crawler for them? That’s fine if you’re only crawling a handful of sites, but absolutely insane if you’re crawling the entire web.

wtetzner 3 minutes ago | parent [-]

Isn't the point of AI that it's good at understanding content written for humans? Why can't the scrapers run the homepage through an LLM to detect that?

I'm also not sure why we should be prioritizing the needs of scraper writers over human users and site operators.