▲ | bjornsing 2 days ago | |
I’m not arguing that there’s something fundamentally wrong with mathematics or the scientific method. I’m arguing that the social norms around how we do science in practice have some serious flaws. Gwern points out one of them. One that IMHO is quite interesting. EDIT: I also get the feeling that you think it’s okay to do an incorrect hypothesis test (c > 0), as long as you also look at the effect size. I don’t think it is. You need to test the c > 0.3 hypothesis to get a mathematically sound hypothesis test. How many papers do that? | ||
▲ | syntacticsalt a day ago | parent [-] | |
My opinion of Gwern's piece is that some of the arguments he makes don't require correlations. For example, A/B tests of differences in means using a zero difference null hypothesis will reject the null, given enough data. In that A/B testing scenario, I think if someone wants to test whether the difference is zero, that's fine, but if the effect size is small, they shouldn't claim that there's any meaningful difference. I believe the pharma literature calls this scenario equivalence testing. Assuming a positive difference in means is desirable, I think testing for a null hypothesis of a change of at least some positive value (e.g., +5% of control) is a better idea. I believe the pharma literature calls this scenario superiority testing. I believe superiority testing is preferable to equivalence testing, and in professional settings, I have made this case to managers. I have not succeeded in persuading them, and thus do the equivalence testing they request. I don't think the idea of a zero null hypothesis is necessarily mathematically unsound. In cases like the difference in means, a zero null hypothesis is well-posed. However, I agree with you that there are better practices, like a null hypothesis incorporating a nonzero effect. I don't entirely agree with the arguments Gwern puts forth in the Implications section because some of them seem at odds with one another. Betting on sparsity would imply neglecting some of the correlations he's arguing are so essential to capture. The bit about algorithmic bias strikes me as a bizarre proposition to include with little supporting evidence, especially when there are empirical examples of algorithmic bias. What I find lacking about Gwern's piece is that it's a bit like lighting a match to widespread statistical practice, and then walking away. Yes, I think null hypothesis statistical testing is widely overused, and that statistical significance alone is not a good determinant of what constitutes a "discovery". I agree that modeling is hard, and that "everything is correlated" is, to an extent, true because the correlations are not literally or exactly zero. But if you're going to take the strong stance that null hypothesis statistical testing is meaningless, I believe you need to provide some kind of concrete alternative. I don't think Gwern's piece explicitly advocates an alternative, and it only hints the alternative might be causal inference. Asking people who may not have much statistics training to leap from frequentist concepts taught in high school to causal inference would be a big ask. If Gwern isn't asking that, then I'd want to know what a suggested alternative would be. Notably, Gwern does not mention testing for nonzero positive effects (e.g., in the vein of the "c > 0.3" case above). If there isn't an alternative, I'm not sure what the argument is. Don't use statistics, perhaps? It's tough to say. |