| ▲ | macleginn 5 days ago | ||||||||||||||||
If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing. | |||||||||||||||||
| ▲ | WithinReason 5 days ago | parent [-] | ||||||||||||||||
No, that's not how backprop works. There will be no discontinuity in a backpropagated gradient. | |||||||||||||||||
| |||||||||||||||||