Remix.run Logo
danpalmer 18 hours ago

How do you define a user, and how do you define online?

If the forum considers unique cookies to be a user and creates a new cookie for any new cookie-less request, and if it considers a user to be online for 1 hour after their last request, then actually this may be one scraper making ~6 requests per second. That may be a pain in its own way, but it's far from 23k online bots.

crote 18 hours ago | parent | next [-]

That's still 518.400 requests per day. For static content. And it's a niche forum, so it's not exactly going to have millions of pages.

Either there are indeed hundreds or thousands of AI bots DDoSing the entire internet, or a couple of bots are needlessly hammering it over and over and over again. I'm not sure which option is worse.

n1xis10t 18 hours ago | parent [-]

Imagine if all this scraping was going into a search engine with a massive index, or a bunch of smaller search engines that a meta-search engine could be made for. This’d be a lot more cool in that case

thethingundone 18 hours ago | parent | prev [-]

AFAIK it keeps a user counted as online for 5 or 15 minutes (I think 5). It’s a Woltlab Burning Board.

Edit: it’s 15 minutes.

danpalmer 16 hours ago | parent [-]

And what is a "user"?

thethingundone 14 hours ago | parent [-]

Whatever the forum software Woltlab Burning Board considers a user. If I recall correctly, it tries to build an identifier based on PHP session ids, so most likely simply cookies.

danpalmer 13 hours ago | parent [-]

This is exactly my point. Scrapers typically don't store cookies, so every single request is likely to be a "new" user as far as the forum software is concerned.

Couple that with 15 minute session times, and that could just be one entity scraping the forum at 30 requests per second. One scraper going moderately fast sounds far less bad than 29000 bots.

It still sounds excessive for a niche site, but I'd guess this is sporadic, or that the forum software has a page structure that traps scrapers accidentally, quite easy to do.