| ▲ | oakridge 3 days ago | |
One way to think of it is that each point in your data follows your model but with gaussian iid noise shifting them away. The likelihood is then product of gaussians mean shifted and rescaled by variance. Minimize the log-likelihood then becomes reducing the sum of (x-mu)^2 for each point, which is essentially least squares. | ||