They do. There are enormous redundancies. There's a manifold over which the parameters can vary wildly yet do zilch to the output. The nonlinear analogue of a null space.

Parameter instability does not worry a machine learner as much as it worries a statistician. ML folks worry about output instabilities.

The current understanding goes that this overparameterization makes reaching good configurations easier while keeping the search algorithm as simple as stochastic gradient descent.

▲

kqr 2 hours ago | parent [-]

Huh, I didn't know that! Are there efforts to automatically reduce the number of parameters once the model is trained? Or do the relationships between parameters end up too complicated to do that? I would assume such a reduction would be useful for explainability.

(Asking specifically about time series models and such.)

	▲	srean an hour ago \| parent [-]
		What you are looking for is the lottery ticket hypothesis for neural networks. Hit a search engine with those words you will find examples. https://arxiv.org/abs/1803.03635 ( you can follow up on semantic scholar for more) Selecting which weights to discard seems as hard as the original problem. But random decimation, sometimes barely informed decimation have been observed to be effective. On the theory side now it's understood that in the thicket of weights, lurk a much much smaller subset that can have nearly the same output. These observations are for DNNs in general. For time series specifically I don't know what the state of the art is. In general NNs are still catching up with traditional stats approaches in this domain. There are a few examples where traditional approaches have been beaten, but only a few. One good source to watch are the M series of competitions.