Remix.run Logo
nextos 2 days ago

> I’ve never personally worked on a problem that I felt wasn’t adequately approached with frequentist methods

Multilevel models are one example of problem were Bayesian methods are hard to avoid as otherwise inference is unstable, particularly when available observations are not abundant. Multilevel models should be used more often as shrinking of effect sizes is important to make robust estimates.

Lots of flashy results published in Nature Medicine and similar journals turn out to be statistical noise when you look at them from a rigorous perspective with adequate shrinking. I often review for these journals, and it's a constant struggle to try to inject some rigor.

From a more general perspective, many frequentist methods fall prey to Lindley's Paradox. In simple terms, their inference is poorly calibrated for large sample sizes. They often mistake a negligible deviation from the null for a "statistically significant" discovery, even when the evidence actually supports the null. This is quite typical in clinical trials. (Spiegelhalter et al, 2003) is a great read to learn more even if you are not interested in medical statistics [1].

[1] https://onlinelibrary.wiley.com/doi/book/10.1002/0470092602

michaelbarton 2 days ago | parent | next [-]

Curious what you might consider “adequate shrinking”?

Horshoe priors, partial pooling, something more?

I realize that might be highly subject

nextos 2 days ago | parent [-]

I guess this depends on the problem at hand.

But I was thinking about a typical hierarchical model with partial pooling and standard weakly informative priors.

statskier 2 days ago | parent | prev | next [-]

I agree Bayesian approaches to multilevel modeling situations are clearly quite useful and popular.

Ironically this has been one of the primary examples of, in my personal experience, with the problems I have worked on, frequentist mixed & random effects models have worked just fine. On rare occasions I have encountered a situation where the data was particularly complex or I wanted to use an unusual compound probability distribution and thought Bayesian approaches would save me. Instead, I have routinely ended up with models that never converge or take unpractical amounts of time to run. Maybe it’s my lack of experience jumping into Bayesian methods only on super hard problems. That’s totally possible.

But I have found many frequentist approaches to multilevel modeling perfectly adequate. That does not, of course, mean that will hold true for everyone or all problems.

One of my hot takes is that people seriously underestimate the diversity of data problems such that many people can just have totally different experiences with methods depending on the problems they work on.

nextos 2 days ago | parent [-]

These days, the advantage is that a generative model can be cleanly decoupled from inference. With probabilistic languages such as Stan, Turing or Pyro it is possible to encode a model and then perform maximum likelihood, variational Bayes, approximate Bayesian inference, as well as other more specialized approaches, depending on the problem at hand.

If you have experienced problems with convergence, give Stan a try. Stan is really robust, polished, and simple. Besides, models are statically typed and it warns you when you do something odd.

Personally, I think once you start doing multilevel modeling to shrink estimates, there's no way back. At least in my case, I now see it everywhere. Thanks to efficient variational Bayes methods built on top of JAX, it is doable even on high-dimensional models.

jmalicki 2 days ago | parent | prev | next [-]

Thank you for Lindley's paradox! TIL

getnormality 2 days ago | parent | prev [-]

The evidence "actually supports the null" over what alternative?

In a Bayesian analysis, the result of an inference, e.g. about the fairness of a coin as in Lindley's paradox, depends completely on the distribution of the alternative specified in the analysis. The frequentist analysis, for better and worse, doesn't need to specify a distribution for the alternative.

The classic Lindley's paradox uses a uniform alternative, but there is no justification for this at all. It's not as though a coin is either perfectly fair or has a totally random heads probability. A realistic bias will be subtle and the prior should reflect that. Something like this is often true of real-world applicaitons too.

_alternator_ 2 days ago | parent [-]

Thank you. The main problem with Bayesian statistics is that if the outcome depends on your priors, your priors, not the data determine the outcome.

Bayesian supporters often like to say they are just using more information by coding them in priors, but if they had data to support their priors, they are frequentists.

kgwgk 2 days ago | parent [-]

If they were doing frequentist inference they wouldn’t be using priors at all and there is nothing frequentist in using previous data to construct prior distributions.

uoaei 2 days ago | parent [-]

Not true. In frequentist statistics, from the perspective of Bayesians, your prior is a point distribution derived empirically. It doesn't have the same confidence / uncertainty intervals but it does have an unnecessarily overconfident assumption of the nature of the data generating process.

kgwgk 2 days ago | parent [-]

Not true. In frequentist statistics, from the perspective of Bayesians and non-Bayesians alike, there are no priors.

—-

Dear ChatGPT, are there priors in frequentist statistics? (Please answer with a single sentence.)

No — unlike Bayesian statistics, frequentist statistics do not use priors, as they treat parameters as fixed and rely solely on the likelihood derived from the observed data.

zozbot234 2 days ago | parent | next [-]

There's always priors, they're just "flat", uniform priors (for maximum likelihood methods). But what "flat" means is determined by the parameterization you pick for your model. which is more or less arbitrary. Bayesians would call this an uninformative prior. And you can most likely account for stronger, more informative priors within frequentist statistics by resorting to so-called "robust" methods.

_alternator_ 2 days ago | parent | next [-]

First, there is not such thing as a ‘uninformative’ prior; it’s a misnomer. They can change drastically based on your paramerization (cf change of variables in integration).

Second, I think the nod to robust methods is what’s often called regularization in frequentist statistics. There are cases where regularization and priors lead to the same methodology (cf L1 regularized fits and exponential priors) but the interpretation of the results is different. Bayesian claim they get stronger results but that’s because they make what are ultimately unjustified assumptions. My point is that if they were fully justified, they have to use frequentist methods.

kgwgk 2 days ago | parent [-]

One standard way to get uninformative priors is to make them invariant under the transformation groups which are relevant given the symmetries in the problem.

kgwgk 2 days ago | parent | prev [-]

It’s not true that “there are always priors”. There are no priors when you calculate the area of a triangle, because priors are not a thing in geometry. Priors are not a thing in frequentist inference either.

You may do a Bayesian calculation that looks similar to a frequentist calculation but it will be conceptually different. The result is not really comparable: a frequentist confidence interval and a Bayesian credible interval are completely different things even if the numerical values of the limits coincide.

zozbot234 2 days ago | parent [-]

Frequentist confidence intervals as generally interpreted are not even compatible with the likelihood principle. There's really not much of a proper foundation for that interpretation of the "numerical values".

kgwgk 2 days ago | parent [-]

What does “as generally interpreted” mean? There is one valid way to interpret confidence intervals. The point is that it’s not based on a posterior probability and there is no prior probability there either.

kgwgk 2 days ago | parent | prev [-]

If you want to say that when you do a frequentist analysis which doesn’t include any concept of prior you get a result that has a similar form to the result of a completely different conceptually Bayesian analysis which uses a flat prior (definitely not “a point distribution derived empirically”) that may be correct. It remains true that there is no prior in the frequentist analysis because they are not part of frequentist inference at all.

uoaei 19 hours ago | parent [-]

Priors are not used in construction of frequentist approaches, but that does not mean that the analyses aren't isomorphic in theory.

Point distribution <=> point estimate as a sample from an initially flat distribution. A priori vs a posteriori perspectives, which are equivocal if we are to take your description of frequentist statistics into account ;)

kgwgk 17 hours ago | parent [-]

It’s not my description of frequentist statistics. It’s the frequentist statisticians’ description. This is from Wasserman’s All of Statistics:

The statistical methods that we have discussed so far are known as frequentist (or classical) methods. The frequentist point of view is based on the following postulates:

F1 […]

F2 Parameters are fixed, unknown constants. Because they are not fluctuating, no useful probability statements can be made about parameters.

F3 […]