Remix.run Logo
Hey, wait – is employee performance Gaussian distributed?(timdellinger.substack.com)
325 points by timdellinger 3 days ago | 47 comments
sangnoir 3 days ago | parent | next [-]

> Performance management, as practiced in many large corporations in 2024, is an outdated technology that is in need of an update

Author made a couple of fundamental mistakes: the first is they assume employees are (or should be) paid according to how much they "individually" earned the company. Employers strive to pay employees the minimum they can bear, on employer's terms. Those terms are information asymmetry and a Gaussian distribution. Fairness is the last thing one should expect from employers, but being honest about this is not good for morale, so instead, they rely on keeping employees uninformed, while the employers collude to gather everyone's remuneration history via the Work Number.

The second mistake they made is assume that companies would prioritize being lean and trimming the mediocre & bottom 5%. There are other considerations, combined productivity is more important than having individual superstars working on the shiniest features. How much revenue do you think a janitor or café staffer generates? Close to zero. The same goes for engineering. Someone has to do the unglamorous staff, or you end up with a dysfunctional company, with amazing talent (on paper).

Edit: there's an infamous graph that shows when aggregate worker productivity and average income. The two tracked closely, rising in tandem until the 1970s, where they got decoupled. With income becoming much flatter, and productivity continuing to rise. That's how the world has been for the past 50 years on the macro and the micro

ianbicking 3 days ago | parent | prev | next [-]

"IQ is Gaussian" – it was pointed out somewhere, and only then became obvious to me, that IQ is not Gaussian. The distribution is manufactured.

If you have 1000 possible IQ questions, you can ask a bunch of people those questions, and then pick out 100 questions that form a Gaussian distribution. This is how IQ tests are created.

This is not unreasonable... if you picked out 100 super easy questions you wouldn't get much information, everyone would be in the "knows quite a lot" category. But you could try to create a uniform distribution, for instance, and still have a test that is usefully sensitive. But if you worry about the accuracy of the test then a Gaussian distribution is kind of convenient... there's this expectation that 50th percentile is not that different than 55th percentile, and people mostly care about that 5% difference only with 90th vs 95th. (But I don't think people care much about the difference between 10th percentile and 5th... which might imply an actual Pareto distribution, though I think it probably reflects more on societal attention)

Anyway, kind of an aside, but also similar to what the article itself is talking about

bhouston 3 days ago | parent | prev | next [-]

I would feel better if this was derived from empirical data rather than just rhetoric. This seems super testable, no? There is probably a ton of data already in different industries with regards to productivity.

Even if human talent have a Pareto distribution (which is not clear), the people employed by a company are a selected sub-set of that population, which would likely have a different distribution depending on how they are selected and the task at hand.

I think that any of these simplified distributions are likely not generalizable across companies and industries (e.g. productivity of AWS or Google employees are likely not distributed like employees of MacDonalds or Wal*Mart because of the difference in hiring procedures and the nature of the tasks.)

Get hard data within the companies and industry you are in and then you can make some arguments. Otherwise, I feel it is too easy to just be talking up a sand castle that has no solid footing.

jedberg 3 days ago | parent | prev | next [-]

One of the things I loved about working at Netflix was that the base assumption was that everyone was a top performer. If you weren't a top performer, you were given a severance check.

The analogy we used was a sports team. Pro sports teams have really good players and great players. Some people are superstars, but unless you're at least really really good you're not on the team.

Performance and compensation were completely separate, which was also nice. Performance evals were 360 peer reviews, and compensation was determined mostly by HR based on what it was costing to bring in new hires, and then bumping everyone up to that level.

So at least at Netflix 10 years ago, performance wasn't really distributed at all. Everyone was top 10% industrywide.

hemloc_io 3 days ago | parent | prev | next [-]

Cool data/idea, and anecdotally lines up with my experiance at BigCos from a coworker perspective.

But in my experiance employee perf evals are more political than data based.

At the end of the day a lot of mgmt at BigCo, esp these days, wants that 10% quota for firing as a weapon/soft layoff and the "data" is a fig leaf to make that happen. More generously it's considered a forcing function for managers to actually find underperformers in their orgs, even if they don't exist. Either way it's not really based on anything other than their own confirmation bias.

IME the scrutiny of perf evaluation is basically tied to the trajectory of the company and labor market conditions. Even companies with harder perf expectations during the good times of ~2021 relaxed their requirements.

riazrizvi 3 days ago | parent | prev | next [-]

This is a well constructed empty argument because it glosses over the central concern, ‘employee performance’. Without defining that we have no idea what the graph represents.

nonameiguess 3 days ago | parent | prev | next [-]

It's worth hammering on this point as much as possible hoping a few people listen, but there is at least one other important point about employee performance. If you're allocating bonuses, a single year's performance is probably a good way to do that, assuming you can accurately measure it. When you're talking retention and promotion, though, you're making a prediction of future performance, possibly at a variety of different jobs. That is even harder to do and more poorly reflected in the last year's results. You have some analogies to sports performance in this article, and you see this kind of thing all the time there. Guy does great in a single year, gets a huge, possibly long-term contract, then tanks. On the other hand, one of the better dynasties of the past decade was accomplished by the Golden State Warriors in the US NBA thanks to underpaying one of the all-time great players in NBA history because he suffered a series of ankle injuries early in his career and scared off other suitors. Single-year performance isn't necessarily reflective of a person's true mean abilities, and their place in the Pareto distribution won't be the same at all levels of advancement and responsiblity, either.

The problem, from a company's perspective, is you probably need to retain everyone at least five years, and actually give them a wide variety of assignments in that time, to really get any usable data about their long-term prospects.

crazygringo 3 days ago | parent | prev | next [-]

This is very unconvincing. The author already admits one reason why:

> But there are low-performing employees at large corporations; we’ve all seen them. My perspective is that they’re hiring errors. Yes, hiring errors should be addressed, but it’s not clear that there’s an obvious specific percentage of the workforce that is the result of hiring errors.

I think it is clear that we expect a certain percentage of hiring "errors". And that they are not binary but rather a continuum. And that there are lots of other factors like employees who were great when they were hired but stopped caring and are "coasting" or just burnt out, who got promoted or transferred when they shouldn't have been and are bad at their new level/role, and so forth.

The Pareto distribution isn't particularly relevant here, because a hiring process isn't trying to get a whole slice of the overall labor market with clear cutoffs. For any position, it's trying to maximize the performance it can get at a given salary, and we have no reason to expect the errors it makes in under- and over-estimating performance to be anything but relatively symmetric.

So a Gaussian distribution is a far more reasonable assumption than a slice of the Pareto distribution, when you look at the multiplicity of factors involved.

doctorpangloss 3 days ago | parent | prev | next [-]

This article: "Wouldn't it be cool if when you measure employee performance, it turned out to fit a Pareto distribution better than a Gaussian?"

Would that be cool? We could posit the implications of all sorts of improbabilities. But I feel more strongly about how cool it would be that P = NP.

All this aside, being laid off sucks - being pushed out, even when you're a high performer, sucks even more. The truth is that "data science" does not help you process grief the way reading Dostoevsky does, so maybe getting an A in your liberal arts education is valuable even when you are working as a software developer.

wavemode 3 days ago | parent | prev | next [-]

This concept is not new - see [0].

There's ample research that Welchian stack ranking, and assuming a Gaussian distribution of employee performance, is not well-founded. Even its original pioneers (General Electric) have abandoned the practice (see [1]).

Not sure why there are so many commenters here defending the Gaussian model. Most researchers at this point agree that a pareto distribution is more realistic.

[0]: https://hbr.org/2022/01/we-need-to-let-go-of-the-bell-curve

[1]: https://qz.com/428813/ge-performance-review-strategy-shift

iambateman 3 days ago | parent | prev | next [-]

As employees, our expectations for performance management come from the system of giving grades in school.

What's interesting is that school grades often doesn't follow a normal distribution, especially for easier classes. I suspect that getting an "A" was possible for 95%+ of students in my gym class and only 5-10% of the students in my organic chemistry class.

In the same way, some jobs are much easier to do well than others.

So we should expect that virtually all administrative positions will have "exceptional" performance, which is to say that they were successful at doing all of the tasks they were asked to do. But for people who's responsibility-set is more consequential, even slightly-above average performance could be 10x more meaningful to the company.

dogleash 3 days ago | parent | prev | next [-]

To me the biggest insight here is that no matter what data science you're trying to do on a group of employees, the people you already have decided should be fired or promoted from that group are outliers and should be removed from the sample.

There are certainly times that you would want them included, but those can be classified under "budgeting," not gaining insight on a workforce.

jampa 3 days ago | parent | prev | next [-]

Going through some performance reviews as a manager, I always try to push back a bit against the bell curve. It kinda reminds me of the "stack ranking.". There are also some factors to be considered:

If you are in a hiring freeze or not promoting, most of the curve should shift right, assuming you are hiring great people. They will probably perform better quarter after quarter. Some might counter-argue that if everyone performs better, this should be the "new expectation," but I disagree: the market sets expectations.

If you have someone at a senior level with expectations of staff, for example, they won't be in the company for long. I hired many great engineers who later said they only looked for a new job because they were never promoted despite being overperformers.

seiferteric 3 days ago | parent | prev | next [-]

A lot of focus on employee performance, but relatively little on management performance. I always wonder how a once great company can slowly decline into irrelevance. Take yahoo for example, it could only be due to management failure over several decades right? How can companies optimize for management performance?

_vaporwave_ 3 days ago | parent | prev | next [-]

> a helpful order of magnitude estimate is that the hiring process all told costs the company approximately a year’s salary

It feels weird to gloss over this since transaction costs this high have a huge impact on how the system should be designed.

bparsons 3 days ago | parent | prev | next [-]

Unless you are measuring the output of people on simple assembly lines, it is very difficult to define "performance".

In a properly functioning team, people perform different, discrete roles which are probably not entirely understood by other team members or management.

directevolve 3 days ago | parent | prev | next [-]

I had assumed stack ranking was specifically designed to force managers to fire low performers, without relying on their individual judgment. Since nobody likes to fire, this overcomes the inertia, and since relying on personal judgment exposed you to legal risk and principle agent problems, a simple rule was substituted. The author’s proposal to go back to managerial discretion would of course be incompatible with that intention.

I do wonder whether those implementing stack ranking are really that committed to a particular statistical model of employee productivity, or if they’re trying to solve a human and legal problem with an algorithm.

AtlasBarfed 3 days ago | parent | prev | next [-]

1) performance reviews are never aligned with employee value, because companies are strongly invested to take excess production from employees and transfer it to management, secondarily shareholders

2) the are also not aligned with the replacement cost of employees because the religion of management is that labor is effortlessly replaceable and low value

3) employee retention is not aligned with corporate performance in Machiavellian middle management, it is aligned with manager promotion for things like loyalty and maintaining fiefdom power, budgetary size, headcount, etc

4) there are no absolute or ever directly derived metrics in software development that have ever worked, to say nothing of other positions

Those are off the top of my head.

losthalo 3 days ago | parent | prev | next [-]

X + Y + [XY] = 8

X = the individual's contribution

Y = the contribution of the system they work within

[XY] = the interaction of the individual with the system

8 represents some measure of productivity, e.g., rate of errors, millions of dollars in profit, whatever you're measuring

The person who can solve for X is competent to rate people on their performance.

What to do instead of (destructively) rating people?

Build better systems for doing the work, make their work easier, give them psychological safety and job security so they can relax and enjoy their work and share better methods with each other.

(All paraphrased from W. Edwards Deming.)

Competition within organizations is for amateurs.

graycat 3 days ago | parent | prev | next [-]

> Hey, wait – is employee performance Gaussian distributed?

Well, the Gaussian distribution gives positive probability to any interval of the real line, including the whole real line (probability 1), so, strictly speaking, no.

But maybe the issue is a distribution with a bell curve or even with just a unique maximum and falling off monotonically from that maximum.

Well, then, in my college teaching, still no: Instead, commonly, roughly, there were three kinds of students: (1) understood the material at least reasonably well, (2) understood some of the material a little, and (3) should have just dropped the course but from me got by with a gentleman C. So, the distribution had a peak for each of (1) -- (3), three peaks, no Gaussian!

Approximate Gaussian is guaranteed, under meager assumptions, from the central limit theorem (CLT) of averaging random variables, the easiest case, independent, identically distributed (IID), and, more depending on how advanced the CLT proof is. A proof due to Lindeberg-Feller long was, maybe still is, regarded as the most powerful CLT.

Apparently ~100 years ago, especially in education, the CLT was commonly regarded as standard, true, without question, maybe some law of nature. Maybe some of the people measuring IQ, SAT scores, etc. also thought this about the Gaussian.

For me, I, in mathematical and applied probability, care first about finite expectation, conditional independence, independence, several convergence results (e.g., the martingale convergence theorem), then IID, and hardly at all, Gaussian.

igorkraw 3 days ago | parent | prev | next [-]

The author looks at "observables" of performance without considering whether there might be confounders such as those discussed in great nuance here https://onlinelibrary.wiley.com/doi/full/10.1111/joes.12328 .

He cites similar work by William Shockley who taught both electrical engineering and scientific racism at Stanford https://en.wikipedia.org/wiki/William_Shockley (no swipe at the author, just pointing at the biased motiviations of some of the researchers foundational to the idea of "high performers").

In general, when you see pareto structures or power laws, you should think of compound or cascade effects, which in human structures generally means some form of social mediation. Affinity for a desireable skill might be gaussian, but the selection process means that the people who _get_ to do that skill might become pareto shaped because if you aren't much better than the next guy, you wouldn't stably stay at the top. Similar logic can hold for other expressions.

In general, I wish more people would read https://blackwells.co.uk/bookshop/product/Causality-by-Judea... or at least the more accessible https://mixtape.scunning.com/ before starting to conjecture from data about social systems - the math will tell you what you can and cannot speculate on.

(fun exercise: draw the causal models of IQ in https://dagitty.net/ and ponder the results)

throwaway48476 3 days ago | parent | prev | next [-]

Setting aside the issue of defining a function for 'employee performance', this glosses over the invisible interactions. An employee in a dysfunctional organization will perform worse than if they were in a well functioning one because they don't have to waste time dealing with people and processes that are a hindrance.

dmurray 3 days ago | parent | prev | next [-]

> For what it’s worth, human height is also Gaussian, and that’s correlated with workplace success.

Height is generally not considered to be Gaussian and this is exactly the kind of statistics mistake the author seems to be accusing employers of. Adult height is somewhere between Gaussian and bimodal.

TrainedMonkey 3 days ago | parent | prev | next [-]

Employee performance MEASUREMENT appears to be Gaussian distributed. To my first simple, and let's be real probably somewhat wrong, approximation there are roughly 3 things that go into it.

1. There is a certain skill in communicating all the important things you've done, we shall lump likability + politicking into this one for convenience.

2. There is a premium that is placed on shiny new features and saving the day heroics. A lot less priority is placed on refactoring and solving the problems before they require heroics.

3. Finally there are individual's technical and self-management skills. I.E. it's important to work on important things and be good at it.

philipov 3 days ago | parent | prev | next [-]

> How much revenue do you think a janitor or café staffer generates? Close to zero. The same goes for engineering. Someone has to do the unglamorous staff, or you end up with a dysfunctional company, with amazing talent (on paper).

If the company would be dysfunctional without that janitor or software engineer, and not bring in as much revenue as a result, it sounds like the model that attributes close to zero revenue to them is already dysfunctional. If the company can't function without the janitor, then a significant portion of the revenue of the company should be attributed to them.

morkalork 3 days ago | parent | prev | next [-]

If you ever look at tranditional human-driven sales data, you'll often see a small percentage of top performers absolutely dominating the total sales volume. So yes, employee performance is not Gaussian at all.

estebarb 3 days ago | parent | prev | next [-]

Some years ago I started doing graphs of code contributions across the year (yeah, wrong thing to measure, I know). A funny thing is that people considered "high performers" could be made the worse performers depending on how you cut the data. Basically, performance had a wave behavior, and nobody was at 100% all the time.

That is a good argument for diverse hiring: people will have bad days/seasons, fact of life. If the team is diverse is less probable that those bad days will correlate between different employees.

wing-_-nuts 3 days ago | parent | prev | next [-]

One reason I'd never work for a company with a 'bottom 10% gets PIP'd' mentality is that it directly conflicts with my goal of self development. Of course I want to be on a great team where everyone performs better than I do. That's how I hone my craft! It just seems really wasteful to have to cull the bottom 10% of every team, even if that team is performing well. I wish there was a list of companies that subscribed to that mentality, so I could avoid them.

warrentr 3 days ago | parent | prev | next [-]

In the work rules book about google, Bock claims (apparently using a lot of real data from google) that employee performance follows a power law distribution.

hammock 3 days ago | parent | prev | next [-]

Why would performance be pareto distributed? Not saying it isn't, just wish we would unpack that idea a bit more.

IQ and other personality traits are gaussian, with which I would expect performance to be correlated

But, the mythical "10X employee" would seem to imply pareto, along with 80/20 notions of both personnel and an individual employee's day-to-day workload

How do we resolve this dichotomy?

Joel_Mckay 3 days ago | parent | prev | next [-]

"Hey wait - is [arbitrary metrics] Gaussian distributed?"

=3

psychoslave 2 days ago | parent | prev | next [-]

The main take away, to my mind is "are we measuring the right things?"

Like, is the system helping to maximize happiness distribution within humanity while maintaining biodiversity in its highest concomitant expectable dynamics?

drc500free 3 days ago | parent | prev | next [-]

I've recently been working with a lot of service center productivity data. Staff productivity (customers/hour) is pretty close to a gaussian, with some skew towards many slight underperformers and few overperformers.

However, any single customer interaction is exponential or weibull distributed.

PaulHoule 3 days ago | parent | prev | next [-]

It depends on the job. If you are interested in the American caste system you should read this classic

https://www.amazon.com/Remember-me-God-Myron-Kaufmann/dp/B00...

Which tells the story of a Jewish person who fails to persevere against prejudice in a multifaceted and sensitive way. In one scene he gets a job as a bank teller and then realizes in some jobs you’ve got the potential to screw up but no potential to distinguish yourself. The world needs people to milk cows every morning, a job you can screw up but not do it 10x better than competent, there is no Pareto or other “exceptional events” distributions for many essential jobs. ER doctors, taxicab drivers, astronauts, etc.

(Productivity is a product of the system + the people)

I worked on one system that had a 40 minute build if you wanted it to be reliable, the people I picked it up from could not build it reliably which is why the project has been going in circles for 1.5 years before I showed up. With no assistance (and orders that I was not supposed to spend time speeding up my build because it didn’t directly help the customer) I got it to a 20 minute build.

Other folks on the team thought I was a real dope because my build took too long and I was always complaining but they couldn’t build it reliably at all.. I mas two major releases of a product with revolutionary performance in one year at which point I felt that I’d done the honorable thing and that I’d feel less backlash anywhere else whether or not I was creating more value —- so I moved on, and was told by recruiters that they hadn’t found a replacement for me in six months.

Had the place I was working at had a 2 minute build they might never had hired me because they would have had the product ready long before.

29athrowaway 3 days ago | parent | prev | next [-]

I guess developers should have a pay structure similar to sales when you make part of your money from bonuses tied to results. But those results are hard to evaluate because shipping something fast can have bugs found after the reward date.

mdnahas 2 days ago | parent | prev | next [-]

I’m an economist who has looked into something like this. This paper is about intra-company performance. I’ve look at incomes of the whole population. There, you find two distribution. Wages seem to follow a log-normal distribution, while income from investments follows a heavy-tailed distribution, like a Pareto.

Wages tend to be smaller than asset income. Top sports players and musicians work for wages and become billionaires. Startup founders, who own assets, become trillionaires.

Obviously, there are differences. Wages are not productivity. (But the article didn’t say how productivity was measured.). Also, a company can choose who joins and leaves it. So one company’s wage distribution doesn’t have to follow the distribution of the wider economy.

xmly 3 days ago | parent | prev | next [-]

Well, managers are trying to make it Gaussian, but underlying is actually power law.

datadrivenangel 3 days ago | parent | prev | next [-]

If you assume that people are promoted to their level of incompetence -- terminal responsibility level, then you would expect that level adjusted performance should approach a Gaussian?

thesz 3 days ago | parent | prev | next [-]

> IQ is Gaussian. The Big Five Personality Trait known as Conscientiousness is likewise Gaussian. For what it’s worth, human height is also Gaussian...

Height cannot be negative, thus, it is not Gaussian. IQ cannot be negative too. Great many things that most people think are Gaussians, are not.

One of such distributions that describe one-sided values, log-normal distribution (logarithms of values are distributed normally) has interesting property that for some d values x=mean+d are more probable than values x=mean-d (heavy tail). Also, sum of log-normal-distributed values does not converge to Gaussian distribution.

uoaei 3 days ago | parent | prev | next [-]

I suspect you can dig into any metric here and find that they are explicitly determined in terms of an assumption of underlying normality.

irrational 3 days ago | parent | prev | next [-]

Is it Q4 at a lot of companies? How many companies align their fiscal calendar with the yearly calendar? Our Q4 is March-May.

pajko 2 days ago | parent | prev | next [-]

The problem is that the performance cannot be measured really, do there's nothing to talk about. KPIs like LOC/month or resolved tickets/month does not tell anything. Nor does completed projects/year, nor does the project's size. Name a single thing which does not depend on external stuff. Have fun with an idiotic client, or an idiotic management. If you want to remove the people who do not work, then do that. If you do not know who they are, fire your managers. Or better, fire yourself too, as it seems like you dont' know what you are doing. If you want to lower costs, fire managers and sales people first.

spyckie2 3 days ago | parent | prev | next [-]

So…

1) treat poor performers as bad hires and ignore them in your dataset

2) treat 10x performers as needing to be promoted and also ignore them in your data

3) treat everyone else as relatively equal

…and use “Pareto distribution” and “no one has mentioned this before” to write a blog post?

Is the point of the article to get people who disagree with 10% corporate culling a pseudo intellectual economic buzzword argument to stroke their hatred of an inefficient hr practice? If so:

1) 10% culling in performance review is a mechanism to cull “bad hires”. I find it difficult to understand how the author can argue it’s a bad practice and then state that you cull bad hires from your dataset without thinking that they are the same thing or at least largely overlapping.

2) If the author is proposing to separate performance review, culling bad hires, and promotions, into 3 separate systems and assume no overlap, he should think through the structural issues more. While it’s possible to design a management structure where the organization is at a constant state of no bad hires, all 10xers promoted, that is putting a lot of responsibility on individual managers to run review, culling and promotion by themselves at a very high level. It’s brittle - a few bad managers not running the system can easily leave your organization bloated with bad hires and no fallback (fallback = performance review process).

3) The system of performance review is equally about risk management to the business as it is about rewarding your employees. IMO, the author’s framing simplifies the problem too much and pushes the complexity out for other people to deal with. It’s the kind of thinking that is damaging to organizations… I wonder if there is a process to cull this kind of thinking from your org… wait what time of year is it??

hinkley 3 days ago | parent | prev | next [-]

Doesn’t quite work with Heart Shaped Box, but ok.

cynicalpeace 3 days ago | parent | prev | next [-]

If you take this as true- what does it imply?

soniman 3 days ago | parent | prev | next [-]

It's the marginal dollar that contributes profit so the marginal employees are actually the most profitable.

xphilter 3 days ago | parent | prev [-]

Yeah good luck. I don’t think any hr decisions have ever been about data; it’s about following norms. If you can get the rand corp or heritage foundation to adopt this policy then maybe corporations would look into it.