▲ | fjkdlsjflkds 3 days ago | |
> Non-Bayesian NN training does indeed use regularizers that are chosen subjectively —- but they are then tested in validation, and the best-performing regularizer is chosen. Thus the choice is empirical, not subjective. I'd argue the choice is still subjective, since you are still only testing over a limited (subjective) set of options. If you are doing this properly (i.e., using an independent validation set), then you can apply the same approach to a Bayesian method and obtain the same type of information ("when I use prior A vs. prior B, how does that change the generalization/out-of-bag error properties of my model?"), without violating any properties or theoretical guarantees of "Bayesianism". > A Bayesian could try the same thing: try out several priors, and pick the one that performs best in validation. But if you pick your prior based on the data, then the classic theory about “principled quantification of uncertainty” doesn’t apply any more. If you subjectively define a set of possible priors (i.e., distributions and parameters) to test in a validation setting, then you are not picking your prior based on the data (again, assuming that you have set up a leakage-free partition of your data in training and validation data), and you are not doing empirical Bayes, so you are not violating any supposed "principled quantification of uncertainty" (if you believe that applying a standard subjective Bayesian approach provides you with "principled quantification of uncertainty"). My point was that, in practice, there are ways of choosing (subjective) priors such that they provide sufficient regularization while ensuring that their impact on the results is minimized, particularly when you can assume certain things about the scale of data (and, in the context of neural networks, you often can, due to things like "normalization layers" and prior scaling of inputs and outputs): "subjective" doesn't have to mean "arbitrary". > So you’re left using a computationally unwieldy procedure that doesn’t offer theoretical guarantees. I won't argue about the fact that training NN using Bayesian approaches is computationally unwieldy. I just don't see how evaluating a modelling decision (be in Bayesian or non-Bayesian modelling), using a proper validation process, would violate any specific theoretical guarantees. If you can explain to me how evaluating the generalization properties of a Bayesian training recipe on an independent dataset violates any specific theoretical guarantees, I would be thankful (note: as far as I am concerned, "principled quantification of uncertainty" is not a specific theoretical guarantee). |