▲ | kgwgk 7 hours ago | |
> Users are randomly assigned to one of the four layouts and you track their activity. Your hypothesis is: layout influences signup behavior. > You plan ship the winner if the p-value for one of the layout choices falls below the conventional threshold of 0.05.
What kind of comparison between the results for the four options makes each of them a likely winner? They all rank very well in whatever metric is being used!Or maybe the are they being compared with a fifth - much worse - alternative. | ||
▲ | vjerancrnjak 7 hours ago | parent [-] | |
This kind of ranking is also not correct. You have to compare the outcomes , not their p values. Ranking by p values is just silly, just like ranking by avg metric is silly. Startups in general have to make decisions with high signal, thinking that 100 improvements of 1% p=0.05 will actually compound in an environment with so much noise is delusion. I’d say doing this kind of silliness in a startup is just ceremonial, helpful long term if people feel they are doing a good job optimizing a compounding metric, even though it never materializes. |