| ▲ | sega_sai 3 days ago |
| The least squares and pca minimize different loss functions. One is
sum of squares of vertical(y) distances, another is is sum of closest distances to the line. That introduces the differences. |
|
| ▲ | anArbitraryOne 2 days ago | parent | next [-] |
| "...sum of squared distances to the line" would be a better description. But it also depends entirely on how covariance is estimated |
|
| ▲ | CGMthrowaway 3 days ago | parent | prev | next [-] |
| That makes sense. Why does least squares skew the line downwards though (Vs some other direction)? Seems arbitrary |
| |
| ▲ | mr_toad 3 days ago | parent | next [-] | | The Pythagorean distance would assume that some of the distance (difference) is on the x axis, and some on the y axis, and the total distance is orthogonal to the fitted line. OLS assumes that x is given, and the distance is entirely due to the variance in y, (so parallel to the y axis). It’s not the line that’s skewed, it’s the space. | |
| ▲ | sega_sai 3 days ago | parent | prev [-] | | I think it has to do with the ratio of \Sigma_xx, \Sigma_yy. I don't have time to verify that, but it should be easy to check analytically. |
|
|
| ▲ | ryang2718 3 days ago | parent | prev [-] |
| I find it helpful to view least as fitting the noise to a Gaussian distribution. |
| |
| ▲ | MontyCarloHall 3 days ago | parent | next [-] | | They both fit Gaussians, just different ones! OLS fits a 1D Gaussian to the set of errors in the y coordinates only, whereas TLS (PCA) fits a 2D Gaussian to the set of all (x,y) pairs. | | |
| ▲ | ryang2718 3 days ago | parent [-] | | Well, that was a knowledge gap, thank you! I certainly need to review PCA but python makes it a bit too easy. |
| |
| ▲ | LudwigNagasena 3 days ago | parent | prev | next [-] | | OLS estimator is the minimum-variance linear unbiased estimator even without the assumption of Gaussian distribution. | | |
| ▲ | rjdj377dhabsn 3 days ago | parent [-] | | Yes, and if I remember correctly, you get the Gaussian because it's the minimum entropy (least additional assumptions about the shape) continuous distribution given a certain variance. | | |
| |
| ▲ | contravariant 3 days ago | parent | prev [-] | | Both of these do, in a way. They just differ in which gaussian distribution they're fitting to. And how I suppose. PCA is effectively moment matching, least squares is max likelihood. These correspond to the two ways of minimizing the Kullback Leibler divergence to or from a gaussian distribution. |
|