It affects science too (and there you'd want solid archiving as much as possible). Increasingly, meta-data is full of errors and general purpose search engines for science are breaking down, including even things like Google Scholar. I suppose some big science publishers are blocking AI bots too.

▲

shevy-java 5 hours ago | parent | next [-]

Google ruined its own search engine on top of that as well though.

We are increasingly becoming blind. To me it looks as if this is done on purpose actually.

	▲	salawat 5 hours ago \| parent [-]
		It was. Advertising is incompatible with accurate data retrieval/routing. We've also implemented "obligation to deindex". So providing an unbiased index of the web as she is is essentially (in the U.S.) verboten.

▲

ninjagoo 5 hours ago | parent | prev | next [-]

> I suppose some big science publishers are blocking AI bots too.

That's a travesty, considering that a huge chunk of science is public-funded; the public is being denied the benefits of what they're paying for, essentially.

▲

galleywest200 5 hours ago | parent [-]

The public can still access the sites themselves.

▲

ninjagoo 5 hours ago | parent [-]

> The public can still access the sites themselves.

Indefinitely? Probably not.

What about when a regime wants to make the science disappear?

▲

thwarted 5 hours ago | parent | next [-]

So the solution is to allow the AI scraping and hide the content, with significantly reduced fidelity and accuracy and not in the original representation, in some language model?

	▲	mlnj 3 hours ago \| parent [-]
		Don't forget the onslaught of ads that will distort the actual publications even more going forward.

▲

pa7ch 5 hours ago | parent | prev [-]

What has that got to do with blocking AI crawlers?

▲

ninjagoo 5 hours ago | parent [-]

If it's publicly funded, why shouldn't AI crawlers have access to that data? Presumably those creating the AI crawlers paid taxes that paid for the science.

▲

JumpCrisscross 4 hours ago | parent [-]

> If it's publicly funded, why shouldn't AI crawlers have access to that data?

Becase it costs money to serve them the content.

▲

wyre 3 hours ago | parent [-]

If I build a business based off of consumption of publicly funded data, and that’s okay, why isn’t it okay for AI?

Is the answer regulate AI? Yes.

	▲	JumpCrisscross 2 hours ago \| parent [-]
		> If I build a business based off of consumption of publicly funded data, and that’s okay, why isn’t it okay for AI? Because when you build it you aren't, presumably, polling their servers every fifteen minutes for the entire corpus. AI scrapers are currently incredibly impolite.

▲

asdff 3 hours ago | parent | prev [-]

Thank god for pubmed and deterministic search operators.