▲ | whimsicalism 3 days ago | |||||||
Despite the common refrain about how different symbolic differentiation and AD are, they are actually the same thing. | ||||||||
▲ | DoctorOetker 3 days ago | parent [-] | |||||||
Not at all. There are mainly 2 forms of AD: forward mode (optimal when the function being differentiated has more outputs than latent parameter inputs) and reverse mode (when it has more latent parameter inputs than outputs). If you don't understand why, you don't understand AD. If you understand AD, you'd know why, but then you'd also see a huge difference with symbolic differentiation. In symbolic differentiation, input is an expression or DAG, the variables being computed along the way are similar such symbolic expressions (typically computed in reverse order in high school or uni, so the expression would grow exponentially with each deeper nested function, and only at the end are the input coordinates filled into the final expression, to end up with the gradient). Both forward and reverse mode have numeric variables being calculated, not symbolic expressions. The third "option" is numeric differentiation, but for N latent parameter inputs this requires (N+1) forward evaluations: N of the function f(x1,x2,..., xi + delta, ..., xN) and 1 reference evaluation at f(x1, ..., xN). Picking a smaller delta makes it closer to a real gradient assuming infinite precision, but in practice there will be irregular rounding near the pseudo "infinitesimal" values of real world floats; alternatively take delta big enough, but then its no longer the theoretical gradient. So symbolic differentiation was destined to fail due to ever increasing symbolic expression length (the chain rule). Numeric differentiation was destined to fail due to imprecise gradient computation and huge amounts (N+1, many billions for current models) of forward passes to get a single (!) gradient. AD gives the theoretically correct result with a single forward and backward pass (as opposed to N+1 passes), without requiring billions of passes, or lots of storage to store strings of formulas. | ||||||||
|