| ▲ | ogrisel a day ago | |
Paul Werbos did not apply backprop to MLPs as cleanly described in Hinton's paper, but rather to some kind of autoregressive non-linear parametrized functions with a much more specific application scope. Both papers are direct applications of the chain rule applied to estimate the gradient of a multivariate function. | ||