I wonder if the best solution is still just to create link mazes with garbage text like this: https://blog.cloudflare.com/ai-labyrinth/

It won't stop the crawlers immediately, but it might lead to an overhyped and underwhelming LLM release from a big name company, and force them to reassess their crawling strategy going forward?

▲

ronsor 6 days ago | parent | next [-]

That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key.

	▲	bogwog 6 days ago \| parent [-]
		If the "garbage data" is AI generated, it'll be hard or impossible to filter.

▲

creatonez 6 days ago | parent | prev [-]

Crawlers already know how to stop crawling recursive or otherwise excessive/suspicious content. They've dealt with this problem long before LLM-related crawling.