Remix.run Logo
taeric 3 hours ago

I confess a sad assumption that bot traffic is far higher than we have admitted for a long time. Though, maybe we would see different stats specifically to social media sights to astroturf like counts? Certainly feels that we have known for a long time that bots were larger in ad viewing than ad companies wanted to admit.

reconnecting 2 hours ago | parent | next [-]

I don't understand what difference bots make. For me, a website (the public part) is a storefront. People walk down the street and see what's inside — that's the purpose. If something should not be available immediately, that's the private part of the store.

I've been monitoring bot traffic on digital platforms for over 10 years. Sure, the crawler share is growing, some even with malicious intentions, and those I detect and block.

I disagree that this pain is worth the cost of making real people spend their life on verification.

Groxx 13 minutes ago | parent | next [-]

For efficiently-hosted sites with little media it's not too bad. E.g. hosting a static site just doesn't cost much, even if you're hammered occasionally.

That's extremely far from all sites though. It's probably safe to say it's a severe minority, particularly when you ignore personal / non-profit-bringing sites. Tons of small and large sites run stuff like poorly-written wordpress or ruby on rails or thousands of microservices doing god knows what. A major increase in request volume on those can easily mean significant increases in hosting charges (e.g. small-% on big, many multiples on small).

taeric 2 hours ago | parent | prev | next [-]

For ad views, the concern is specifically that people pay for clicks and views. That that can be so heavily influenced by bot traffic greatly undermines their value.

Same general idea goes for any of the algorithmic driven platforms. The algorithms are ostensibly intended to surface organically discovered things by watching how people interact with things. That they are so susceptible to distortion through bot farms should be a lot more acknowledged than it is. People trust them far more than they should.

There is also a general cost of running things concern. It isn't like it is completely free to execute on bot traffic.

reconnecting 2 hours ago | parent [-]

For ads, I believe this must be a problem for ad platform owners.

If the digital platform's storefront is their business, they could afford to spend some budget on bot detection. Bots still come from data center networks, sometimes render pages incompletely, request resources in bulk, and show enough patterns to be flagged internally.

If we look at a medium website, most random crawlers will come from Amazon, Microsoft, DigitalOcean, Hetzner, OVH, and a few other DC networks — these can be blocked easily without harming real users. The rest can be detected and cleaned up, even manually.

The math is simple: 20,000 visits a day at 15 seconds each = ~83 hours a day lost watching a Cloudflare logo, just because someone doesn't want to dig into the logs. I don't buy it.

taeric an hour ago | parent [-]

Largely agreed, though I think you are likely underestimating how hard this is to detect. In particular, it is true that many bots can be hosted in data centers, but it is somewhat trivial to launder that traffic through other sources. Malware, in particular, is what I have in mind. Maybe I'm wrong and that has largely gone away?

There is also a bit of mixed incentives. Yes, it is the ad platform that is getting abused. But it is also the ad platform that is charging people based on abused practices.

And it isn't like this is completely made up. Just look at how facebook killed a lot of ton of people during the "pivot to video" programs. I don't know all of the details, as I was thankfully not in any of the involved industries, but my understanding is it is fairly well documented.

Edit: I changed an "isn't" to "is." I think I was trying to reword at one point, but left it in a way that is opposite what I meant.

LorenPechtel an hour ago | parent | prev [-]

When most of your server capacity is going to answering the scrapers it matters. It's not that the stuff is hidden, it's that storefront being flooded with 10x as many customers as the fire code allows. And some of them go around asking your employees mindless questions. (Small forum I help moderate: we were getting hammered with what was probably some sort of AI that was taking search queries and feeding them into the forum search. Search is now registered users only.)

mikey_p 2 hours ago | parent | prev [-]

Well the fun things is that no one knows how much traffic of what kind they are getting when they use Cloudflare.

You get the numbers that Cloudflare tells you, but who knows if you can trust their stats after their CEO is apparently cherry-picking data to shape their product narrative?

thewebguyd an hour ago | parent [-]

That same CEO too that just went on a wild tone-def layoff justification, classifying human employees into roles of either a builder, seller, or measurer and saying he wants to get rid of everyone that "measures" the business...

I wouldn't trust a single thing coming out of his mouth.