▲ | stormfather 4 days ago | |
I choose LeakyReLU vs ReLU depending on if it's an odd day of the week, LeakyReLU being the slightly favored odd-days because it's aesthetically nicer that gradients propagate through negative inputs, though I can't discern a difference. I choose sigmoid if I want to waste compute to remind myself that it converges slowly due to vanishing gradients at extreme activation levels. So its empiricism retroactively justified by some mathematical common sense that let's me feel good about the choices. Kind of like aerodynamics. |