| ▲ | paulfharrison 3 days ago | |
For linear models, least squares leads to the BLUE estimator: Best Linear Unbiassed Estimator. This acronym is doing a lot of work with each of the words having a specific technical meaning. Fitting the model is also "nice" mathematically. It's a convex optimization problem, and in fact fairly straightforward linear algebra. The estimated coefficients are linear in y, and this also makes it easy to give standard errors and such for the coefficients! Also, this is what you would do if you were doing Maximum Likelihood assuming Gaussian distributed noise in y, which is a sensible assumption (but not a strict assumption in order to use least squares). Also, in a geometric sense, it means you are finding the model that puts its predictions closest to y in terms of Euclidean distance. So if you draw a diagram of what is going on, least squares seems like a reasonable choice. The geometry also helps you understand things like "degrees of freedom". So, may overlapping reasons. | ||
| ▲ | mr_toad 3 days ago | parent [-] | |
Best meaning the ‘least variance’, where variance is calculated based on the sum of squared residuals. There is a circularity in that definition. | ||