▲ | Differentiable Programming from Scratch(thenumb.at) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
107 points by sksxihve 6 days ago | 18 comments | |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | FilosofumRex 5 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Historical fact: Differentiable programming was a little known secret back in the 90's, used mainly by engineers simulating numerically stiff systems like nukes and chemicals in FORTRAN 95. It then disappeared for nearly 30 yrs before rediscovery by the ML/AI researchers! | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | constantcrying 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
>Unfortunately, finite differences have a big problem: they only compute the derivative of f in one direction. If our input is very high dimensional, computing the full gradient of f becomes computationally infeasible, as we would have to evaluate f for each dimension separately. And it is terribly unstable numerically. f(x) and f(x+h) are very similar, h is very small. You have to expect destructive cancellation to happen. For black boxes it is the only real alternative though, you can do a bit better by taking a derivative in both directions. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | weinzierl 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
"Now that we understand differentiation, let’s move on to programming. So far, we’ve only considered mathematical functions, but we can easily translate our perspective to programs. For simplicity, we’ll only consider pure functions, i.e. functions whose output depends solely on its parameters (no state)." I think I've seen this notion that the constraint is pureness also in documentation of autodiff libraries, but this cannot be strong enough, right? It easy enough to come up with functions that are nowhere differentiable. So my question is, what are the actual requirements a state of the art autodiff library has for the input function and why do people focus on the pureness aspect if that is probably the least of the problems. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | hwpythonner 5 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I’m not deep into autodiff (just recall some calculus from university), but the syntax in this post reminds me a lot of ML (the programming language, not machine learning) I know autodiff isn’t lambda calculus, but the expression-based structure and evaluation rules feel similar. Couldn’t this be implemented in something like ML or Clojure? Just wondering what the custom DSL adds that existing functional languages wouldn’t already support | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|