| ▲ | rozab 4 hours ago | |||||||
I had the same issue when I first put up my gitea instance. The bots found the domain through cert registration in minutes, before there were any backlinks. GPTbot, ClaudeBot, PerplexityBot, and others. I added a robots.txt with explicit UAs for known scrapers (they seem to ignore wildcards), and after a few days the traffic died down completely and I've had no problem since. Git frontends are basically a tarpit so are uniquely vulnerable to this, but I wonder if these folks actually tried a good robots.txt? I know it's wrong that they ignore wildcards, but it does seem to solve the issue | ||||||||
| ▲ | stefanka an hour ago | parent | next [-] | |||||||
Where does one find a good robots.txt? Are there any well maintained out there? | ||||||||
| ▲ | trillic 3 hours ago | parent | prev | next [-] | |||||||
I will second a good robots.txt. Just checked my metrics and < 100 requests total to my git instance in the last 48 hours. Completely public, most repos are behind a login but there are a couple that are public and linked. | ||||||||
| ▲ | bob1029 4 hours ago | parent | prev [-] | |||||||
> I wonder if these folks actually tried a good robots.txt? I suspect that some of these folks are not interested in a proper solution. Being able to vaguely claim that the AI boogeyman is oppressing us has turned into quite the pastime. | ||||||||
| ||||||||