▲ | bogwog 6 days ago | |||||||
I wonder if the best solution is still just to create link mazes with garbage text like this: https://blog.cloudflare.com/ai-labyrinth/ It won't stop the crawlers immediately, but it might lead to an overhyped and underwhelming LLM release from a big name company, and force them to reassess their crawling strategy going forward? | ||||||||
▲ | ronsor 6 days ago | parent | next [-] | |||||||
That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key. | ||||||||
| ||||||||
▲ | creatonez 6 days ago | parent | prev [-] | |||||||
Crawlers already know how to stop crawling recursive or otherwise excessive/suspicious content. They've dealt with this problem long before LLM-related crawling. |