Remix.run Logo
bhouston 3 days ago

I would feel better if this was derived from empirical data rather than just rhetoric. This seems super testable, no? There is probably a ton of data already in different industries with regards to productivity.

Even if human talent have a Pareto distribution (which is not clear), the people employed by a company are a selected sub-set of that population, which would likely have a different distribution depending on how they are selected and the task at hand.

I think that any of these simplified distributions are likely not generalizable across companies and industries (e.g. productivity of AWS or Google employees are likely not distributed like employees of MacDonalds or Wal*Mart because of the difference in hiring procedures and the nature of the tasks.)

Get hard data within the companies and industry you are in and then you can make some arguments. Otherwise, I feel it is too easy to just be talking up a sand castle that has no solid footing.

Miraltar 3 days ago | parent | next [-]

To me it says that our system is built on a reasonable but untested assumption (performance is a gaussian) and by replacing it with an equally reasonable assumption (performance is a pareto), suddenly our system looks stupid. It isn't really offering a solution but a new perspective

pama 3 days ago | parent | prev | next [-]

I thought that Bonus Content #1 and the references down the article were reasonably convincing. It would be great if large companies disclosed such details but it is unlikely.

wavemode 3 days ago | parent | prev | next [-]

> I would feel better if this was derived from empirical data rather than just rhetoric.

This exact statement applies to the practice of Gaussian performance ranking. It is pure corporate politics, it isn't founded in sound statistics.

The present author at least provides multiple sources of statistical evidence for their beliefs, if you read the footnotes.

KK7NIL 3 days ago | parent | prev | next [-]

The problem is that intellectual productivity is generally not possible to measure directly, so you instead end up with indirect measurements that assume a Gaussian distribution.

IQ is famously Gaussian distributed... mainly because it's defined that way, not because human "intelligence" (good luck defining that) is Gaussian.

If you look at board game Elo ratings (poor test for intelligence but we'll ignore that), they do not follow a Gaussian distribution, even though Elo assumes a Gaussian distribution for game outcomes (but not the population). So that's good evidence that aptitude/skill in intellectual subjects isn't Gaussian (but it's also not Pareto iirc).

jlawson 3 days ago | parent | next [-]

All polygenic traits would be Gaussian by default under the simplest assumptions.

E.g. if there are N loci, and each locus has X alleles, and some of those alleles increase the trait more than others, the trait will ultimately present in a Gaussian distribution.

i.e. if there are lots of genes that affect IQ, IQ will be a Gaussian curve across population.

KK7NIL 3 days ago | parent [-]

Very interested point, this is a close corollary to the central limit theorem, no?

Doesn't this assume a linear relationship between relevant alleles and the given trait though?

boothby 3 days ago | parent | next [-]

The missing assumptions are that the number of genes is large, independently distributed (i.e. no correlations among different genes), and identically distributed. And the whopper: that nurture has no impact.

You can weaken some of those assumptions, but there are strong correlations amongst various genes, and between genes and nurture. And, one "nurture" variable is overwhelmingly correlated to many others: wealth.

Unpacking wealth a little, for the sake of a counterexample: one can consider it to be the sum of a huge number of random variables. If the central limit theorem applied to any sum of random variables, it should be Gaussian, right? Nope, it's much closer to a Pareto distribution.

In summary: the conclusion of the central limit theorem is very appealing to apply everywhere. But like any theorem, you need to pay close attention to the preconditions before you make that leap.

jlawson 3 days ago | parent | next [-]

"Number of genes is large" is what I said, that's not a missing assumption, I said that explicitly.

The nurture/nature relationship to IQ has been well-studied for many decades. There are easy and obvious ways to figure this out by looking at identical twins raised in different homes, adopted children and how much they resemble their birth parents vs adopted parents, etc. Idealists always like to drag out nurture effects on IQ like it's some kind of mystery when it's a well-studied and well-solved empirical question.

SideQuark 3 days ago | parent | prev [-]

It easily includes nature impact for the same reasons: an incredible amount of nuture items are both Gaussian distributed and the population sampled is large.

Wealth being distributed as Pareto would imply its effects on nuture are not Pareto since the effects of wealth are not proportional to wealth. At best there’s diminishing returns. Having 100x the wealth won’t give 100x intelligence, 100x the lifespan, etc. And once you realize this, it’s not far till the math yields another Gaussian.

Bootvis 3 days ago | parent | prev [-]

It does. A lognormal distribution would model that better which gives a nice right tail so maybe it is a useful toy model.

KK7NIL 3 days ago | parent [-]

A long right tail Gaussian fits the Elo ratings of active chess players very well, as I discussed in adjacent comments here.

jlawson 3 days ago | parent [-]

Isn't that just because there is a practical limit to how bad at chess someone can be? That is to say, making utterly random moves.

But there is no limit to how good they can be.

So of course the right tail is longer; the left tail is cut off!

EnergyAmy 3 days ago | parent | prev | next [-]

Do you have a reference for Elo ratings not being Gaussian? A casual search shows lots of graphs and discussions saying it is.

KK7NIL 3 days ago | parent [-]

Look at my reply to bhouston.

Elo ratings for active players are close to Gaussian, but not quite, they show a very clear asymmetry, especially for OTB old school Elo (compared to online Glicko-2).

The active players restriction is a big one and one I didn't assume I in my original statement.

bhouston 3 days ago | parent | prev [-]

> so you instead end up with indirect measurements that assume a Gaussian distribution.

100%. I was going to write something similar.

> If you look at board game Elo ratings (poor test for intelligence but we'll ignore that), they do not follow a Gaussian distribution, even though Elo assumes a Gaussian distribution for game outcomes (but not the population). So that's good evidence that aptitude/skill in intellectual subjects isn't Gaussian (but it's also not Pareto iirc).

Interesting, yeah, Elo is quite interesting. And one can view hiring in a company as something like selecting people for Elo above a certain score, but with some type of error distribution on top of that, probably Gaussian error. So what does a one sided Elo distribution look like with gaussian error in picking people above that Elo limit?

KK7NIL 3 days ago | parent [-]

Lichess has public population data (they use a modified version of Glicko-2 which is basically an updated version of Elo's system): https://lichess.org/stat/rating/distribution/blitz

It's basically a Gaussian with a very long right tail.

Big caveat here is that these are the ratings of weekly active players. If we instead include casual players, I suspect we'd have something resembling a pareto distribution.

doctorpangloss 3 days ago | parent | next [-]

The big caveat is that it's trivial to measure the AIC, BIC and other quality of fit measurements for a distribution. If you think it's so and so distribution, go for it. In my experience in this specific case of chess rankings and in the broader case of test scores, skew-normal and log-normal have worse fits than plain Guassian.

I have no idea why you would believe increasing the population would make this Gaussian distribution look Pareto, when the exact opposite is true - increasing populations make things look more Gaussian - in all natural circumstances.

KK7NIL 3 days ago | parent [-]

I was conjecturing that the distribution would be closer to Pareto for everyone (including people who've never learned how to play chess), hence why I said that "active players" is a big caveat.

> increasing populations make things look more Gaussian - in all natural circumstances.

This is just not the case, there's plenty of "natural circumstances" where populations have non-Gaussian distributions.

Perhaps you meant a specific type of population, like chess ratings? I'd be interested in seeing what you find there, but all I've found shows significantly distorted tails (not to mention a skew from 1500).

JackFr 3 days ago | parent | prev [-]

Good question - do the bad players play less because they are bad, or are they bad because they play less?

bhouston 3 days ago | parent [-]

> Good question - do the bad players play less because they are bad, or are they bad because they play less?

Both for sure. If you don't practice you will never rise much about bad. But if you are bad and not progressing you won't play much because it isn't rewarding to lose.

One needs to almost figure out those with low ELO ratings, what is their history compared to the number of games played and see if they were following an expected ELO progression.

I wonder if you can estimate with any accuracy where a player will eventually plateau given just a small-ish sampling of their first games. Basically estimate the trajectory based on how they start and progress. This would be interesting. Given how studied Chess is, I expect this is already done to some extent somewhere.

drcwpl 3 days ago | parent | prev | next [-]

Agree with you - although, rhetorically speaking, I have come across many instances which the author refers to "of low performers are 3x as common as high performers." This is unfortunate as I always think do your best, and as Tyler Cowen states - Average is Over. So agree it would have been way better to use empirical data to back up this claim especially.

hinkley 3 days ago | parent | prev | next [-]

> I think that any of these simplified distributions are likely not generalizable across companies and industries

It’s going to be multivariate statistics with dependent variables. The quality of non developers at company affects the quality of developers they can retain, and the quality of the developers you have affects the quality of developers you can recruit and improve. Almost all the people I’d want to work with again left my last employer before I did.

You can take on more and more work yourself but it causes everyone around you to disengage. At some point you have to realize it’s more fruitful, emotionally and mathematically, to make coworkers produce one more unit of forward progress a month than to do it to yourself. Because it’s 2% for the team one way and 5-10% the other.

groby_b 3 days ago | parent | prev [-]

> There is probably a ton of data already in different industries with regards to productivity.

Uh. Not really. Our industry is notoriously bad at measuring productivity.

And the bigger problem is that when we try to measure it - "performance review" - we like grading on a gaussian curve. We'll never know if that's correct because we put our thumb on the scale.

An even bigger problem is that productivity is strongly influenced by completely non-technical factors. How enthusiastic are folks about what they are doing[1], how much variety do their tasks have [2], what are their peers like, etc. (Of course, that whole field of study has issues rooted in the inability to measure precisely as well)

Ultimately, it's a squishy judgment applied by humans.

[1] https://www.semanticscholar.org/paper/What-Predicts-Software...

[2] https://research.google/pubs/what-predicts-software-develope...