▲ | shoo 3 days ago | |
> In addition to Alex’s relaxation-based solver, we tried several gradient descent-based solvers, which resulted in better convergence. The first was the uncmin from numericjs. Later, we found a version of uncmin that was modified by Guillermo Webster for his g9 project, which gave us even better results. I was curious about what optimisation algorithm uncmin actually was. The current code & docs do not cite any academic literature apart from Moré et al, 1981 ("Testing unconstrained optimization software"). But, the commit messages and comments in older versions of uncmin [1] suggest uncmin is an implementation of BFGS. In the scientific Python ecosystem, BFGS is used as the default nonlinear optimisation method by scipy.optimize.minimize [2] if your problem doesn't have any constraints or bounds. Scipy optimize cites Nocedal & Wright - Numerical Optimization (2nd ed) 2006 as a reference. BFGS is Algorithm 6.1 on page 140. Nocedal & Wright say "the most popular quasi-Newton algorithm is the BFGS method, named for its discoverers Broyden, Fletcher, Goldfarb, and Shanno" after introducing quasi-Newton methods: > Quasi-Newton methods, like steepest descent, require only the gradient of the objective function to be supplied at each iterate. By measuring the changes in gradients, they construct a model of the objective function that is good enough to produce superlinear convergence. The improvement over steepest descent is dramatic, especially on difficult problems. Moreover, since second derivatives are not required, quasi-Newton methods are sometimes more efficient than Newton’s method. [...] > The development of automatic differentiation techniques has made it possible to use Newton’s method without requiring users to supply second derivatives; see Chapter 8. Still, automatic differentiation tools may not be applicable in many situations, and it may be much more costly to work with second derivatives in automatic differentiation software than with the gradient. For these reasons, quasi-Newton methods remain appealing. Interestingly, uncmin originally had a much more complex line-search implementation, that was removed and replaced with a simple one. Ucmin's current line search code agrees with "Algorithm 3.1 (Backtracking Line Search)" from Nocedal & Wright, with some implementation-defined choices of: an initial step size of 1, first Wolfe condition checked with a constant of c_1 = 0.1, and step size halved until first Wolfe condition satisfied. One part of the uncmin code that I can't immediately reconcile with equations from Nocedal & Wright is the code for updating "H1". The commit message that introduced it says "Sherman-Morrison for the BFGS update". The math seems to agree with the update for H_k , the approximate inverted Hessian, on the wikipedia page for BFGS [3] (with a quirk where one H_k should arguably be a H_k^T, but the approximation H_k is meant to be symmetric, provided BFGS doesn't break down, so that's probably fine). edit: Nocedal & Wright comment on some pitfalls with simple line search procedures when implementing BFGS > The performance of the BFGS method can degrade if the line search is not based on the Wolfe conditions. For example, some software implements an Armijo backtracking line search (see Section 3.1): The unit step length alpha_k = 1 is tried first and is successively decreased until the sufficient decrease condition (3.6a) is satisfied. For this strategy, there is no guarantee that the curvature condition y_k^T s_k > 0 (6.7) will be satisfied by the chosen step, since a step length greater than 1 may be required to satisfy this condition. To cope with this shortcoming, some implementations simply skip the BFGS update by setting H_{k+1} = H_k when y_k^T s_k is negative or too close to zero. This approach is not recommended, because the updates may be skipped much too often to allow H_k to capture important curvature information for the objective function f . In Chapter 18 we discuss a damped BFGS update that is a more effective strategy for coping with the case where the curvature condition (6.7) is not satisfied. "some software implements an Armijo backtracking line search (see Section 3.1)" -- this appears to match uncmin's simple line search code. I do not see any curvature condition check in the uncmin code, so this could lead to slow convergence or failure for some problems if the line search chooses a step size where y_k^T s_k <= 0 . [1] https://github.com/sloisel/numeric/blame/656fa1254be540f4287... [2] https://docs.scipy.org/doc/scipy/reference/generated/scipy.o... [3] https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80... |