| ▲ | mpolson64 2 hours ago | |||||||||||||
I'm no expert on chess engine development, but it's surprising to me that both lc0 and stockfish use SPSA for "tuning" the miscellaneous magic numbers which appear in the system rather than different black box optimization algorithms like Bayesian optimization or evolutionary algorithms. As far as I am aware both of these approaches are used more often for similar tasks in non-chess applications (ex. hyperparameter optimization in ML training) and have much more active research communities compared to SPSA. Is there something special about these chess engines that makes SPSA more desirable for these use cases specifically? My intuition is that something like Bayesian optimization could yield stronger optimization results, and that the computational overhead of doing BO would be minimal compared to the time it takes to train and evaluate the models. | ||||||||||||||
| ▲ | LPisGood 43 minutes ago | parent | next [-] | |||||||||||||
One thing I wonder is why design of experiments (DOE) methodology is so seldom used for these things. Statisticians and operations researchers have spent a hundred years deciding how to do as few experiments as possible to tweak parameters in the ways that give the highest impact with statistical basis that the selections are good. In the language of information and decision trees, these experiments are trying to in some sense “branch” on the entropy minimizing variables. | ||||||||||||||
| ||||||||||||||
| ▲ | sscg13 2 hours ago | parent | prev [-] | |||||||||||||
Engines like Stockfish might have over 100 "search parameters" that need to be tuned, to my best knowledge SPSA is preferred because the computational cost typically does not depend on the number of parameters. Or, if attempting to use SPSA to say, perform a final post-training tune to the last layers of a neural network, this could be thousands of parameters or more. | ||||||||||||||
| ||||||||||||||