Remix.run Logo
macleginn 5 days ago

If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing.

WithinReason 5 days ago | parent [-]

No, that's not how backprop works. There will be no discontinuity in a backpropagated gradient.

macleginn 5 days ago | parent [-]

I did not say there will be a discontinuity in the gradient; I said that the modified loss function will not have a mathematically well-defined derivative because of the discontinuity in the function.

WithinReason 3 days ago | parent [-]

Which is completely irrelevant to the point I was making