▲ | diggan 3 days ago | |||||||
> The other problem with logs is that it's very difficult to filter out all of the bot traffic. It's not very difficult, but isn't not effort-less. Start with something like https://github.com/allinurl/goaccess/blob/master/config/brow... which captures 99% of the crawlers out there. Then, when you notice there is one particular user-agent/IP/IP-range doing a bunch of requests, add it to list and re-run. Doing filtering based on ASNs that you see are being used for crawling lets you filter most of the AI agents too. We've been dealing with this problem for over 2 decades now, and there are solutions out there that removes almost all of it from your logs. | ||||||||
▲ | lazide 3 days ago | parent [-] | |||||||
Sounds about as much fun as manual spam filtering, but with worse tools. :( | ||||||||
|