▲ | simonw 11 hours ago | |||||||||||||
People often use specific user agents in there, which is hard if you don't know what the user agents are in advance! | ||||||||||||||
▲ | lxgr 8 hours ago | parent | next [-] | |||||||||||||
That seems like a potentially very useful addition to the robots.txt "standard": Crawler categories. Wanting to disallow LLM training (or optionally only that of closed-weight models), but encouraging search indexing or even LLM retrieval in response to user queries, seems popular enough. | ||||||||||||||
▲ | 6 hours ago | parent | prev | next [-] | |||||||||||||
[deleted] | ||||||||||||||
▲ | wat10000 11 hours ago | parent | prev [-] | |||||||||||||
If you're using a specific user agent, then you're saying "I want this specific user agent to follow this rule, and not any others." Don't be surprised when a new bot does what you say! If you don't want any bots reading something, use a wildcard. | ||||||||||||||
|