Remix.run Logo
1vuio0pswjnm7 3 hours ago

There is a unfortunate incentive created when a "business" (MiTM) depends on "bot traffic", i.e., the continued nuisance of bot traffic, to make money

If the "bot traffic" declines, then the "bot protection business" goes down with it

Cloudflare communication are sometimes careful to refer to traffic _labeled as_ bot traffic versus actual bot traffic

Because the "business" relies on the existance of "bot traffic", theres an incentive to broaden the scope of what is labeled as "bot traffic"

The false positive rate can be high. The public should see those statistics, and in truth it may be infeasible to compile them when theres no verification and the entire system relies on heuristics

"Bot protection" can be used to gather fingerprints for marketing

It can be used to force users to use certain software, e.g., certain browsers, and to enable Javascript subjecting users to data collection, surveillance and ads

Originally the motivation for avoiding "bot traffic" was based on behaviour, e.g., exceeding acceptable rates of usage, making too many requests in a given time period, exceeding rate limits

Now it's available to exclude traffic based on criteria such as what browser someone is using. NB. This is more than "user-agent string". The company forces people to sign NDAs before telling them what it is doing to fingerprint www users

If residential proxies are the problem then why not go after the companies that provide them

The truth is that those companies are not the problem. Their customers are so-called "tech" companies

Perhaps it's these so-called "tech" companies that are the problem

Certainly the problem is not the individual www user who doesnt use an "approved" graphical, Javascript-enabled browser who gets blocked or fingerprinted trying to make a single request

But thats who suffers from "bot protection" so that so-called "tech" companies can profit from data collection, surveillance and ads

1vuio0pswjnm7 2 minutes ago | parent | next [-]

Consider fingerprinting as a commercially-oriented data collection method in addition to a "bot protection" method

Is data collection for "age verification" truly for age verification purposes only

Many bloggers and online commenters are not willing to accept that assumption

Is fingerprinting different

It is data collection for the purpose of establishing "online identity"

To the extent its based on user-agent strings or other HTTP headers it requires allowing a third party to MiTM TLS connections

Some bloggers and online commenters are opposed to computer users who MiTM _their own_ TLS connections

How is Cloudflare different

Xirdus 2 hours ago | parent | prev | next [-]

> Originally "bot traffic" was based on behaviour, e.g., exceeding acceptable rates of usage, making too many requests in a given time period, exceeding rate limits

> Now it's available to exclude traffic based on criteria such as what browser someone is using

I'm pretty sure user-agent-based bot detection predates every request-rate-based method by quite a few years.

gruez an hour ago | parent | prev | next [-]

>It can be used to force users to use certain software, e.g., certain browsers, and to enable Javascript subjecting users to data collection, surveillance and ads

>Certainly the problem is not the individual www user who doesnt use an "approved" graphical, Javascript-enabled browser who gets blocked or fingerprinted trying to make a single request

The alternatives to javascript fingerprinting are either ineffective (TLS fingerprinting and/or IP rate limits), or even worse for privacy (eg. attestation).

>If residential proxies are the problem then why not go after the companies that provide them

realusername 32 minutes ago | parent [-]

> The alternatives to javascript fingerprinting are either ineffective (TLS fingerprinting and/or IP rate limits), or even worse for privacy (eg. attestation).

Javascript fingerprinting itself is ineffective, these kind of checks only stop the most basic bots and I'd argue the same for attestation.

gruez 15 minutes ago | parent [-]

It's ineffective in the sense that in the worst case, bots can buy used iPads or whatever and use a robot arm + camera to do the scraping, but each incremental step increases the cost for scrapers. TLS fingerprinting means you can't use curl/requests and call it a day. Javascript makes it even more complicated by requiring a headful browser to solve challenges. The purpose is to increase the cost, not to eliminate all bots.

qaq an hour ago | parent | prev [-]

Most instances I've seen people paying for cloudflare main motivator was load balancing or DDOS protection. Obviously anecdotal ...