Remix clone Hacker News

new | show | ask | jobs Github

	▲	kgwgk 9 hours ago
		> Users are randomly assigned to one of the four layouts and you track their activity. Your hypothesis is: layout influences signup behavior. > You plan ship the winner if the p-value for one of the layout choices falls below the conventional threshold of 0.05. `Tests P-value B is winner 0.041 A is winner 0.051 D is winner 0.064 C is winner 0.063` What kind of comparison between the results for the four options makes each of them a likely winner? They all rank very well in whatever metric is being used! Or maybe the are they being compared with a fifth - much worse - alternative.
	▲	vjerancrnjak 9 hours ago \| parent [-]
		This kind of ranking is also not correct. You have to compare the outcomes , not their p values. Ranking by p values is just silly, just like ranking by avg metric is silly. Startups in general have to make decisions with high signal, thinking that 100 improvements of 1% p=0.05 will actually compound in an environment with so much noise is delusion. I’d say doing this kind of silliness in a startup is just ceremonial, helpful long term if people feel they are doing a good job optimizing a compounding metric, even though it never materializes.