Remix.run Logo
srean 3 days ago

> It's also not necessarily immediately obvious that the derivatives ARE wrong if the implementation is wrong.

It's neither full proof or fool proof but an absolute must is a check that the loss function is reducing. It quickly detects a common error that the sign came out wrong in my gradient call. Part of good practice one learns in grad school.