| ▲ | frumiousirc 2 days ago | |
L1 (abs linear difference) is useful as minimizing on it gives an approximation of minimizing on L0 (count, aka maximizing sparsity). The reason for the substitution is that L1 has a gradient and so minimization can be fast with conventional gradient descent methods while minimizing L0 is a combinatoric problem and solving that is "hard". It is also common to add an L1 term to an L2 term to bias the solution to be sparse. | ||