| ▲ | hamdingers 4 hours ago | |||||||
Yes. It's a basic scraper that fetches the document, parses it for URLs using regex, then fetches all those, repeat forever. I've done honeypot tests with links in html comments, links in javascript comments, routes that only appear in robots.txt, etc. All of them get hit. | ||||||||
| ▲ | dumbfounder an hour ago | parent [-] | |||||||
We need to update robots.txt for the LLM world, help them find things more efficiently (or not at all I guess). Provide specs for actions that can be taken. Etc. | ||||||||
| ||||||||