Remix.run Logo
duvenaud 4 days ago

I agree that Bayesian neural networks haven't been worth it in practice for many applications, but I think the main problem is that it's usually better to spend your compute training a single set of weights for a larger model, rather than doing approximate inference over weights in a smaller model. The exception is probably scientific applications where you mostly know the model, but then you don't really need a neural net anymore.

Choosing a prior is hard, but I'd say it's analogously hard to choosing an architecture - if all else fails, you can do a brute force search, and you even have the marginal likelihood to guide you. I don't think it's the main reason why people don't use BNNs much.

dkga 4 days ago | parent [-]

I disagree with one conceptual point; if you are truly Bayesian you don’t “choose” a prior, by definition you “already have” a prior that you are updating with data to get to a posterior.

abm53 4 days ago | parent | next [-]

100% correct, but there are ways to push Bayesian inference back a step to justify this sort of thing.

It of course makes the problem even more complex and likely requires further approximations to computing the posterior (or even the MAP solution).

This stretches the notion that you are still doing Bayesian reasoning but can still lead to useful insights.

DiscourseFan 4 days ago | parent [-]

Probably should just call it something else then; though, I gather that the simplicity of Bayes theorom belies the complexity of what it hides.

hgomersall 4 days ago | parent | prev | next [-]

At some level, you have to choose something. You can't know every level in your hierarchy.

duvenaud 3 days ago | parent | prev [-]

Sure, instead of saying "choose" a prior, you could say "elicit". But I think in this context, focusing on a practitioner's prior knowledge is missing the point. For the sorts of problems we use NNs for, we don't usually think that the guy designing the net has important knowledge that would help making good predictions. Choosing a prior is just an engineering challenge, where one has to avoid accidentally precluding plausible hypotheses.