Remix.run Logo
waldrews 4 days ago

The Conformal Prediction advocates (especially a certain prominent Twitter account) tend to rehash old frequentist-vs-bayesian arguments with more heated rhetoric than strictly necessary. That fight has been going on for almost a century now. Bayesian counterargument (in caricature form) would be that MLE frequentists just choose an arbitrary (flat) prior, and penalty hyperparameters (common in NN) are a de facto prior. The formal guarantees only have bite in the asymptotic setting or require convoluted statements about probabilities over repeated experiments; and asymptotically, the choice of prior doesn't matter anyway.

(I'm a moderate that uses both approaches, seeing them as part of a general hierarchical modeling method, which means I get mocked by either side for lack of purity).

Bayesians are losing ground at the moment because their computational methods haven't been advanced as fast by the GPU revolution for reasons having to do with difficulty in parallelization, but there's serious practical work (especially using JAX) to catch up, and the whole normalizing flow literature might just get us past the limitations of MCMC for hard problems.

But having said that, Conformal Prediction works as advertised for UQ as a wrapper on any point estimating model. If you've got the data for it - and in the ML setting you do - and you don't care about things like missing data imputation, error in inputs, non-iid spatio-temporal and hierarchical structures, mixtures of models, evidence decay, unbalanced data where small-data islands coexist big data - all the complicated situations where Bayesian methods just automatically work and other methods require elaborate workarounds, yup, use Conformal Prediction.

Calibration is also a pretty magical way to improve just about any estimator. It's cheap to do and it works (although hard to guarantee anything with that in the general case...)

And don't forget quantile regression penalties! Awkward to apply in the NN setting, but an easy and effective way to do UQ in XGBoost world.

dccsillag 4 days ago | parent | next [-]

Yeah, I know the account you are talking about, it really is a bit over the top. It's a shame, I've met a bunch of people who mentioned that they were actually turned away from Conformal Prediction due to them.

> But having said that, Conformal Prediction works as advertised for UQ as a wrapper on any point estimating model. If you've got the data for it - and in the ML setting you do - and you don't care about things like missing data imputation, error in inputs, non-iid spatio-temporal and hierarchical structures, mixtures of models, evidence decay, unbalanced data where small-data islands coexist big data - all the complicated situations where Bayesian methods just automatically work and other methods require elaborate workarounds, yup, use Conformal Prediction.

Many of these things can actually work really well with Conformal Prediction, but the algorithms require extensions (much like if you are doing Bayesian inference, you also need to update your model accordingly!). They generally end up being some form of reweighting to compensate for the distribution shifts (excluding the Online Conformal Prediction literature, which is another beast entirely). Also, worth noting that if you have iid data then Conformal Prediction is remarkably data-efficient; as little as 20 samples are enough for it to start working for 95% predictive intervals, and with 50 samples (and with almost surely unique conformity scores) it's going to match 95% coverage fairly tightly.

3abiton 4 days ago | parent [-]

Are we talking about NN Taleb? I am curious about the twitter persona.

GemesAS 3 days ago | parent [-]

Someone by the name of V. Minakhin. They have an irrational hatred of Bayesian statistics. He blocked me on twitter for pointing out his claim about significant companies do not use Bayesian methods is contradicted by the fact that I work for one of those companies and use Bayesian methods.

travisjungroth 3 days ago | parent [-]

Netflix uses Bayesian methods all over the place. In a meeting presenting new methods, I called squinting at A/B test results and considering them in the context of prior knowledge "shoot-from-the-hip cowboy Bayes". This eventually lead to a Cowboy Bayes T-shirt, hat and all.

ComplexSystems 3 days ago | parent | prev [-]

"Bayesian counterargument (in caricature form) would be that MLE frequentists just choose an arbitrary (flat) prior, and penalty hyperparameters (common in NN) are a de facto prior."

This has been my view for a while now. Is this not correct?

In general, I think the idea of a big "frequentist vs Bayesian" debate is silly. I think it is very useful to take frequentist ideas and see what they look like from a Bayesian point of view, and vice versa (when applicable). I think this is pretty much the general stance among most people in the field - it's generally expected that one will understand that regularization methods equate to certain priors, for instance, and in general be able to relate these two perspectives as much as possible.

duvenaud 3 days ago | parent [-]

I would argue against the idea that "MLE is just Bayes with a flat prior". The power of Bayes usually comes mainly from keeping around all the hypothesis that are compatible with the data, not from the prior. This is especially true in domains where something black-box (essentially prior-less) like a neural net has any chance of working.