| ▲ | srean 3 hours ago | |||||||
Numerical overflow mostly no, but in case of exploding gradient, yes especially about coming up with a way to handle it, on your own, from scratch. After all, it took the research community some time to figure out a fix for that. But the examples you quoted were not my examples, at least not their primary movers (the NaNs could be caused by overflow but that overflow can have a deeper cause). The examples I gave have/had very different root causes at play and the fixes required some facility with maths, not to the extent that you have to be capable of discovering new math, or something so complicated as the geometry and topology of strings, but nonetheless math that requires grad school or advanced and gifted undergrad level math. Coming back to numeric overflow that you mention. I can imagine a software engineer eventually figuring out that overflow was a root cause (sometimes they will not). However there's quite a gap between overflow recognition and say knowledge of numerical analysis that will help guide a fix. You say > "literally every single example"... can be dealt without much math. I would be very keen to learn from you about how to deal with this one, say. Without much math.
This is not a gotcha, a genuine curiosity here because it is always useful to understand a solution different from your own(mine). | ||||||||
| ▲ | p1esk an hour ago | parent [-] | |||||||
Maybe I don’t understand this data labeling issue - are you talking about imbalanced classification dataset? Are hard classes under-represented or missing labels completely? | ||||||||
| ||||||||