Remix.run Logo
reconnecting 7 days ago

The website I mentioned has over 15k webpages and ~200 GB of media, and yet we monitor bots manually and only block them if they're pulling 5k requests in a row. Malicious URLs, multiply 404 are blocked by default. HEAD request rejected.

Even on a very bad day, the server's page load time doesn't go over 1s.

However, it seems like I'm indeed looking at the problem through the wrong prism, as what I've seen from the comments suggests that the initial issue is performance, and the bots are what uncover it.

Groxx 7 days ago | parent [-]

I think a good chunk of it is bot-induced performance problems, yea. Whether that's compute or transfer. And advertisement costs.

Optimization is very very much not a solved problem though, just look at basically all software ever written - it's written for an optimization priority and to a price point (whether commercial $$ or via personal time), and that target's value to its users has shifted rather dramatically.

reconnecting 7 days ago | parent [-]

This is really interesting. I indeed looked at this problem from the wrong perspective.

I'm working on an open-source tool that could be useful for bot detection, but I'm still not confident that anyone would deploy it on-prem and make the setup/maintenance instead of just routing traffic through the cloud.

Perhaps performance as a KPI could work. Thanks!

Groxx 7 days ago | parent [-]

I think you'd definitely find some interest, e.g. anyone that intentionally avoids "the cloud" will want something local. Honestly I assume there are some of these already, monitoring apache/nginx/etc logs. Anubis is arguably similar and has been exploding lately, for example, though I'm not sure if it auto-updates its rules at all: https://github.com/TecharoHQ/anubis

As to if it'd get enough interest: yea no idea at all. I wish you luck tho! Clearly there's a need for this kind of thing.

reconnecting 7 days ago | parent [-]

Our team develops a risk-based analytics system that we also use for bot detection. From our perspective, bots shouldn't be blindly blocked, but rather properly monitored and blocked only when necessary. Here is a live demo (1) to give you a general idea.

1. https://play.tirreno.com (admin/tirreno)