Remix.run Logo
rdoherty 4 hours ago

Skimming the list, looks like most extensions are for scraping or automating LinkedIn usage. Not surprising as there's money to be made with LinkedIn data. Scraping was a problem when I worked there, the abuse teams built some reasonably sophisticated detection & prevention, and it was a constant battle.

cxr 3 hours ago | parent | next [-]

In order to create the data source that LinkedIn's extension-fingerprinting relies on to work, someone (at LinkedIn*?) almost certainly violated the Chrome Web Store TOS—by (perversely*) scraping it.

* if LinkedIn didn't get it from an existing data source

winddude 3 hours ago | parent | prev | next [-]

a problem for linkedin != "a problem". The real problem for people is the back room data brokering linkedin and others do.

bryanrasmussen 4 hours ago | parent | prev | next [-]

from the code doesn't look like they do anything if they have a match, they just save all the results to a csv for fingerprinting?

cxr 3 hours ago | parent [-]

"The code" here you're referring to (fetch_extension_names.js[1]) isn't and doesn't claim to be LinkedIn's fingerprinting code. It's a scraper that the researcher behind this repo wrote themselves in order to create the CSV of the data that they're publishing here.

LinkedIn's fingerprinting code, as the README explains, is found in fingerprint.js[2], which embeds a big JSON literal with the IDs of the extensions it probes for. (Sickeningly enough, this data starts about two-thirds of the way through the file* and isn't the culprit behind the bulk of its 2.15 MB size…)

* On line 34394; the one starting:

    const r = [{
                id: "aacbpggdjcblgnmgjgpkpddliddineni",
                file: "sidebar.html"
1. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>

2. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>

hsbauauvhabzb 4 hours ago | parent | prev | next [-]

Wont someone think of poor little LinkedIn, a subsidiary of one of the largest data brokers in the world?

charcircuit 4 hours ago | parent | next [-]

Why frame what you are trying to say like that? Businesses of all sizes deserve the ability to protect their businesses from abuse.

jmward01 3 hours ago | parent | next [-]

Do they respect my data? Why do they get to track me across sites when I clearly don't want them to but someone can't scrape their data when they don't want them to. Why should big companies get the pass but individuals not? They clearly consider internet traffic fair game and are invasive and abusive about it so it is not only fair to be invasive and abusive back, it is self defense at this point.

hsbauauvhabzb 3 hours ago | parent | next [-]

They don’t need to track your web browser when they’re owned by Microsoft, because they track every action at a lower level.

0x1ch an hour ago | parent | next [-]

Weird, I don't use Windows as an OS but have linkedin. I'd believe the concern and disregard of Linkedin's concern is fair game.

missingdays 3 hours ago | parent | prev [-]

What lower level? Microsoft owns internet?

zelphirkalt 3 hours ago | parent [-]

The operating system. For example see the Windows 11 screenshot debacle/scandal.

3 hours ago | parent | prev | next [-]
[deleted]
john-h-k an hour ago | parent | prev [-]

Because you signed up to a set of terms and conditions saying LinkedIn can use your data in this way

echelon an hour ago | parent [-]

I didn't want the web to turn into monolithic platforms. I abhor this status quo.

You cannot function without these enterprises, but that doesn't mean they're ideal or even ethical.

Microsoft wins because of network effects. It's impossible to compete. So I think it should be allowed to assail their monopoly here by any means. It's maximally fair for consumers and for free markets.

Ideally capitalism remains cutthroat and impossible to grow into undislodgeable titans.

Even more ideally, this would become a distributed protocol rather than a privately owned and guarded database.

ronsor 4 hours ago | parent | prev | next [-]

I think they framed it this way because they don't consider scraping abuse (to be fair, neither do I, as long as it doesn't overload the site). Botting accounts for spam is clear abuse, however, so that's fair game.

hsbauauvhabzb 3 hours ago | parent [-]

No, I consider all data collection and scraping egregious. From that perspective, LinkedIn is hypocritical when Microsoft discloses every filesystem search I do locally to bing.

dylan604 2 hours ago | parent [-]

Are you not scraping a site with your eyeballs when you view a site?

RockRobotRock an hour ago | parent | prev | next [-]

When they scrape, it’s innovation. When you scrape, it’s a felony.

nitwit005 3 hours ago | parent | prev | next [-]

I'm sure there are issues with fake accounts for scraping, but the core issue is that LinkedIn considers the data valuable. LinkedIn wants to be able to sell the data, or access to it at least, and the scrapers undermine that.

They could stop all the scraping by providing a downloadable data bundle like Wikipedia.

compiler-guy 3 hours ago | parent [-]

LLMs scrape Wikipedia all the time, or at least attempt to.

The data bundle doesn't help that at all.

sellmesoap 4 hours ago | parent | prev | next [-]

We enjoy the fruits of an LLM or two from time to time, derived from hoards of ill gotten data. Linkedin has the resourses to attempt to block scraping, but even at the resource scale of LI I doubt the effort is effective.

charcircuit 3 hours ago | parent [-]

I am not denying that scraping is useful. If it wasn't people wouldn't do it. But if the site rules say you aren't allowed to scrape, then I don't think people should be hostile towards the people enforcing the rules.

ronsor 3 hours ago | parent [-]

Well, they can try to enforce the rules; that's perfectly fair. At the same time, there are many methods of "trying" which I would not consider valid or acceptable ones. "Enforcing the rules" does not give a carte blanche right to snoop and do "whatever's necessary." Sony tried that with their CD rootkits and got multiple lawsuits.

cyanydeez 21 minutes ago | parent | prev | next [-]

the abuse>using the information they publish to the public

b112 3 hours ago | parent | prev | next [-]

Yes, until it becomes abusive and malignly affects innocents.

schmidtleonard 4 hours ago | parent | prev [-]

The big social media businesses deserve a Teddy Roosevelt character swooping in and busting their trusts, forcing them to play ball with others even if it destroys their moats. Boo hoo! Good riddance. World's tiniest violin.

This is a popular position across the aisle. Here's hoping the next guy can't be bought, or at least asks for more than a $400M tacky gold ballroom!

xp84 4 hours ago | parent | prev [-]

I mean, regardless of who they are or even if you don’t like what LinkedIn does themselves with the data people have given them, the random third parties with the extensions don’t additionally deserve to just grab all that data too, do they?

mathfailure 3 hours ago | parent | next [-]

Surely they do! The data is in the public internets, aren't they?

ronsor 3 hours ago | parent [-]

They'd put Widevine or PlayReady DRM on the website if they could, I'm sure.

bigfishrunning 3 hours ago | parent [-]

why can't they?

josephg 3 hours ago | parent | prev | next [-]

Eh. I worked at a company which made an extension which scraped LinkedIn. We provided a service to recruiters, who would start a hiring process by putting candidates into our system.

The recruiters all had LinkedIn paid accounts, and could access all of this data on the web. We made a browser extension so they wouldn’t need to do any manual data entry. Recruiters loved the extension because it saved them time.

I think it was a legitimate use. We were making LinkedIn more useful to some of their actual customers (recruiters) by adding a somewhat cursed api integration via a chrome extension. Forcing recruiters to copy and paste did’t help anyone. Our extension only grabbed content on the page the recruiter had open. It was purely read only and scoped by the user.

xp84 an hour ago | parent [-]

Doesn't sound like your operation was particularly questionable, but I can imagine there must be some of those 3,000 extensions where the data flow isn't just "DOM -> End User" but more of a "Dom -> Cloud Server -> ??? -> Profit!" with perhaps a little detour where the end user gets some value too as a hook to justify the extension's existence.

hsbauauvhabzb 2 hours ago | parent | prev | next [-]

I say the same thing about my start menu sending every action I perform to bing.

sieabahlpark 4 hours ago | parent | prev [-]

[dead]

dumbo23 3 hours ago | parent | prev [-]

[dead]