▲ | jsheard 3 days ago | ||||||||||||||||
The other problem with logs is that it's very difficult to filter out bots masquerading with browser user agents, of which there are a lot nowadays. I've watched the logs on newly registered domains that aren't published anywhere besides the certificate transparency logs and seen the majority of traffic coming from "Chrome". Yeah I'm sure you are. | |||||||||||||||||
▲ | diggan 3 days ago | parent [-] | ||||||||||||||||
> The other problem with logs is that it's very difficult to filter out all of the bot traffic. It's not very difficult, but isn't not effort-less. Start with something like https://github.com/allinurl/goaccess/blob/master/config/brow... which captures 99% of the crawlers out there. Then, when you notice there is one particular user-agent/IP/IP-range doing a bunch of requests, add it to list and re-run. Doing filtering based on ASNs that you see are being used for crawling lets you filter most of the AI agents too. We've been dealing with this problem for over 2 decades now, and there are solutions out there that removes almost all of it from your logs. | |||||||||||||||||
|