▲ | BrenBarn 13 hours ago | |
Aside from the p-values, I don't understand the reasoning behind whatever "experiment" is being used for the A/B testing. What test is being done whose result is interpreted as "A is winner"? The discussion is about those being separate comparisons, and yeah, okay, but what are they comparisons of? Each group in isolation vs. all the others? If (as the article says) the hypothesis is "layout influences signup behavior" then it seems more reasonable to do a chi-squared test on a contingency table of layout vs. signed-up-or-didn't, which would give you one p-value for "is there anything here at all". And then, if there isn't. . . it means you can just ship whatever you want! The real root cause of p-hacking is glossed over in the article: "Nobody likes arriving empty-handed to leadership meetings." This is the corporate equivalent of "no one will publish a null result", and is just as harmful here. The statistical techniques described are fine, but there's not necessarily a reason to fortify your stats against multiple comparisons rather than just accepting a null result. And you can, because of the other thing I kept thinking when reading this: you have to ship something. There isn't really a "control" condition if you're talking about building a website from scratch. So whether the result is null doesn't really matter. It's not like comparing different medicines or fertilizers or something where if none of them work you just do nothing; there is no "do nothing" option in this situation. So why not just take a simple effect measurement (e.g., proportion who signed up) and pick the layout that performs best? If that result is statistically significant, great, it means you picked the best one, and if it's not, it just means it doesn't matter which one you pick, so the one you picked is still fine. (And if you have an existing design and you're trying to see if a new one will be better, the null result just means "there's no reason to switch", which means the existing design is also fine.) |