Remix.run Logo
diggan 3 days ago

> The other problem with logs is that it's very difficult to filter out all of the bot traffic.

It's not very difficult, but isn't not effort-less. Start with something like https://github.com/allinurl/goaccess/blob/master/config/brow... which captures 99% of the crawlers out there. Then, when you notice there is one particular user-agent/IP/IP-range doing a bunch of requests, add it to list and re-run. Doing filtering based on ASNs that you see are being used for crawling lets you filter most of the AI agents too.

We've been dealing with this problem for over 2 decades now, and there are solutions out there that removes almost all of it from your logs.

lazide 3 days ago | parent [-]

Sounds about as much fun as manual spam filtering, but with worse tools. :(

diggan 3 days ago | parent [-]

Literally takes 5 minutes to setup at most, and most analytics tools ships with a "ignore webcrawlers" option somewhere, like goaccess does for example, taking 0 minutes to use :)