▲ | srean 3 days ago | |
> It's also not necessarily immediately obvious that the derivatives ARE wrong if the implementation is wrong. It's neither full proof or fool proof but an absolute must is a check that the loss function is reducing. It quickly detects a common error that the sign came out wrong in my gradient call. Part of good practice one learns in grad school. |