Interesting to see this here! This example is one of the ones mentioned in the appendix of https://arxiv.org/abs/2406.09699, specifically "4.1.2.4 When AD is algorithmically correct but numerically wrong".

If people want a tl;dr, the main idea is you can construct an ODE where the forward pass is trivial, i.e. the ODE solution going forwards is exact, but its derivative is "hard". An easy way to do this is to make it so you have for example `x' = x - y, y' = y - x`, with initial conditions x(0)=y(0). If you start with things being the same value, then the solution to the ODE is constant since `x' = y' = 0`. But the derivative of the ODE solution with respect to its initial condition is very non-zero: a small change away from equality and boom the solution explodes. You write out the expression for dy(t)/dy(0) and what you get is a non-trivial ODE that has to be solved.

What happens in this case though is that automatic differentiation "runs the same code as the primal case". I.e., automatic differentiation has the property that it walks through your code and differentiates the steps of your code, and so the control flow always matches the control flow of the non-differentiated code, slapping the derivative parts on each step. But here the primal case is trivial, so the ODE solver goes "this is easy, let's make dt as big as possible and step through this easily". But this means that `dt` is not error controlled in the derivative (the derivative, being a non-trivial ODE, needs to have a smaller dt and take small steps in order to get an accurate answer), so the derivative then gets error due to this large dt (or small number of steps). Via this construction you can make automatic differentiation give as much error as you want just by tweaking the parameters around.

Thus by this construction, automatic differentiation has no bound to the error it can give, and no this isn't floating point errors this is strictly building a function that does not converge to the correct derivative. It just goes to show that automatic differentiation of a function that computes a nice error controlled answer does not necessarily give a derivative that also has any sense of error control, that is a property that has to be proved and is not true in general. This of course is a bit of a contrived example to show the point that the error is unbounded, but then it points to real issues that can show up in user code (in fact, this example was found because a user opened an issue with a related model).

Then one thing that's noted in here too is that the Julia differential equation solvers hook into the AD system to explicitly "not do forward-mode AD correctly", incorporating the derivative terms into the time stepping adaptivity calculation, so that it is error controlled. The property that you get is that for these solvers you get more steps to the ODE when running it in AD mode than outside of AD mode, and that is a requirement if you want to ensure that the user's tolerance is respected. But that's explicitly "not correct" in terms of what forward-mode AD is, so calling forward-mode AD on the solver doesn't quite do forward mode AD of the solver's code "correctly" in order to give a more precise solution. That of course is a choice, you could instead choose to follow standard AD rules in such a code. The trade-off is between accuracy and complexity.

▲

ogogmad 3 days ago | parent | next [-]

I don't think this is a failure of Automatic Differentiation. It's a failure of assuming that the derivative of an approximation is equal to an approximation of the derivative. That's obviously nonsense, as the example f(x) = atan(10000 x)/10000, which is nearly 0, shows you. This was first pointed out here in this comment: https://news.ycombinator.com/item?id=45293066

	▲	adgjlsfhk1 3 days ago \| parent [-]
		the part that's really weird is that AD normally produces extremely accurate derivatives so the rare algorithms where it doesn't are very surprising at first.

▲

dleary 3 days ago | parent | prev [-]

Thank you for this good description.