Remix clone Hacker News

new | show | ask | jobs Github

	▲	bcyn 20 hours ago
		Great read, thanks! Could you dive a little deeper into example 2 & pre-registration? Conceptually I understand how the probability of false positives increases with the number of variants. But how does a simple act such as "pre-registration" change anything? It's not as if observing another metric that already existed changes anything about what you experimented with.
	▲	PollardsRho 20 hours ago \| parent \| next [-]
		If you have many metrics that could possibly be construed as "this was what we were trying to improve", that's many different possibilities for random variation to give you a false positive. If you're explicit at the start of an experiment that you're considering only a single metric a success, it turns any other results you get into "hmm, this is an interesting pattern that merits further exploration" and not "this is a significant result that confirms whatever I thought at the beginning." It's basically a variation on the multiple comparisons, but sneakier: it's easy to spend an hour going through data and, over that time, test dozens of different hypotheses. At that point, whatever p-value you'd compute for a single comparison isn't relevant, because after that many comparisons you'd expect at least one to have uncorrected p = 0.05 by random chance.
	▲	noodletheworld 19 hours ago \| parent \| prev [-]
		There are many resources that will explain this rigorously if you search for the term “p-hacking”. The TLDR as I understand it is: All data has patterns. If you look hard enough, you will find something. How do you tell the difference between random variance and an actual pattern? It’s simple and rigorously correct to only search the data for a single metric; other methods, eg. Bonferroni correction (divide p by k) exist, but are controversial (1). Basically, are you a statistician? If not, sticking to the best practices in experimentation means your results are going to be meaningful. If you see a pattern in another metric, run another experiment. [1] - https://pmc.ncbi.nlm.nih.gov/articles/PMC1112991/