Remix.run Logo
clusterhacks 3 days ago

I am confused about how to feel about the data the paper is based on. If you look at the paper, the data description is:

"Our primary data source is a detailed LinkedIn-based resume dataset provided by Revelio Labs ...

We complement the worker resume data with Revelio’s database of job postings, which tracks recruitment activity by the firms since 2021 ...

The final sample consists of 284,974 U.S. firms that were successfully matched to both employee position data and job postings and that were actively hiring between January 2021 and March 2025.3 For these firms, we observe 156,765,776 positions dating back to 2015 and 245,838,118 job postings since 2021, of which 198,773,384 successfully matched with their raw text description."

They identified 245 million job postings from 2021 forward in the United States? I mean the U.S. population is like 236 million for the 18-65 age group (based on wikipedia, 64.9% of 342 total population).

And they find a very small percentage of firms using generative AI:

"Our approach allows us to capture firms that have actively begun integrating generative AI into their operations. By this measure, 10,599 firms, about 3.7 percent of our sample, adopted generative AI during the study period."

Maybe I am wildly underestimating just how much LinkedIn is used worldwide for recruiting? As a tech person, I'm also very used to seeing the same job listing re-listed by what seems to be a large number of low-effort "recruiting" firms on LinkedIn.

I think for trying to figure out how generative AI is affecting entry-level jobs, I would have been much more interested in some case studies. Something like find three to five companies (larger than startups? 100+ employees? 500+?) that have decided to hire fewer entry-level employees by adding generative AI into their work as a matter of policy. Then maybe circling back from the case studies to this larger LinkedIn dataset and tied the case study information into the LinkedIn data somehow.

squigz 3 days ago | parent [-]

> For these firms, we observe 156,765,776 positions dating back to 2015 and 245,838,118 job postings since 2021, of which 198,773,384 successfully matched with their raw text description."

I'm obviously misreading this somehow. How do you have 156m positions dating back to 2015, but far more than that number in a smaller timeframe?

clusterhacks 3 days ago | parent | next [-]

I think it is just poorly worded. From another point in the paper:

"Our analysis draws on a new dataset that combines LinkedIn resume and job-posting data from Revelio Labs. The dataset covers nearly 285,000 U.S. firms, more than 150 million employment spells from roughly 62 million unique workers between 2015 and 2025, and over 245 million job postings."

I guess we can read that as saying the authors identified 62 million workers who held 150 million positions over the 2015-2025 time window.

I'm still deeply skeptical about the underlying data. The 62 million represents a huge percentage of employed people in the U.S. in any of the years 2015-2025. This source shows 148 million/yr to 164 million/yr employed over that timeframe:

  https://www.statista.com/statistics/269959/employment-in-the-united-states/
On the other hand, I also saw estimates saying LinkedIn has approximately 30% of the U.S. workforce with a profile on the platform. Which is wild to me.
squigz 3 days ago | parent [-]

Ah okay, positions (actual jobs) vs job postings. Makes sense. Thanks!

legacynl 3 days ago | parent | prev [-]

You're misreading, 156m positions, and 245m job postings. A single position can have multiple job postings created for it.