Remix.run Logo
frumiousirc 2 days ago

L1 (abs linear difference) is useful as minimizing on it gives an approximation of minimizing on L0 (count, aka maximizing sparsity). The reason for the substitution is that L1 has a gradient and so minimization can be fast with conventional gradient descent methods while minimizing L0 is a combinatoric problem and solving that is "hard". It is also common to add an L1 term to an L2 term to bias the solution to be sparse.