Remix.run Logo
andyferris a day ago

I have to ask - what OS do AI-training web scrapers tend to report? (A mixture? One with > 5% linux market share? Sorry, being a sceptic, otherwise I think this is fantastic news if accurately measured).

da_chicken a day ago | parent | next [-]

Most of these types of surveys do their best to filter out robots.

With over 50% of Internet traffic being robots, the results really don't make any sense at all if you don't.

hollerith a day ago | parent | prev | next [-]

Good question. Most of these headlines about Linux market share ("mind share"?) are completely uninformative about how widespread the use of Linux is in reality.

12 years ago or so, a similar headline appeared, then someone explained that the Chinese government had recently cracked down on Windows pirating (to appease the Americans) with the result that some PC vendors had stopped including (pirated copies of) Windows with the computers they sell (shipping some Linux distro instead of course) but since pirated Windows install media was still widely available, there quickly grew a cultural practice in which the consumer installs Windows (or gets his more technically-inclined cousin to do it for him) as soon as he gets his new PC home. But the headline reported on a statistic that did not catch this cultural practice because it counted only the OSes on computers when they were sold (i.e., "OS shipments").

cowboylowrez a day ago | parent | next [-]

I wonder if they used firefox to download internet explorer?

okasaki a day ago | parent | prev [-]

What's "windows pirating" when Microsoft offers public ISO downloads and you can activate them with MAS?

hollerith a day ago | parent [-]

The details of how the Chinese PC buyer gets Windows on his new PC is irrelevant to my point (as is whether it deserves the name "pirating").

Nab443 a day ago | parent | prev | next [-]

I tend to think that they mostly should be using their own user agent, and if not be desguised as the most common ones to avoid being detected too easily. Web scaping probably has been mostly running under Linux before the age of AI anyway. I'm not in the field, so if anyone more trustworthy info on that...

eloisant a day ago | parent [-]

Yes they run Linux, but they either have their own user agent (not included in the stats) or are spoofing a real world web browser... In which case they might be spoofing Chrome on Windows even if they run on Linux.

Either way I don't think the 5% are impacted by scraping bots.

viraptor a day ago | parent | prev | next [-]

None https://platform.openai.com/docs/bots There's no reason for those bots to report any specific OS

triknomeister a day ago | parent | prev [-]

Anything that's automated today is linux. So, I'll assume almost 99.99%, or may be BSD in some cases.

input_sh a day ago | parent | next [-]

Any scraper out there that doesn't want to identify itself as such is very likely to spoof the most commonly used OS + browser combo (Chrome + Windows), regardless of what it's actually running on.

baal80spam a day ago | parent | prev [-]

So basically the 5% number is pulled out of thin air.