Remix.run Logo
simonw 19 hours ago

On the one hand, this is a very nicely presented explanation of how to run statistically significant A/B style tests.

It's worth emphasizing though that if your startup hasn't achieved product market fit yet this kind of thing is a huge waste of time! Build features, see if people use them.

noodletheworld 19 hours ago | parent | next [-]

“This kind of thing” being running AB tests at all.

There’s no reason to run AB / MVT tests at all if you’re not doing them properly.

cdavid 11 hours ago | parent | prev | next [-]

A/B testing does not have to involve micro optimization. If done well, it can reduce the risk / cost of trying things. For example, you can A/B test something before investing a full prod development, etc. When pushing for some ML-based improvements (e.g. new ranking algo), you also want to use it.

This is why the cover of the reference A/B test book for product dev has a hippo: A/B test is helpful against just following the HIghest Paid Person Opinion. The practice is ofc more complicated, but that's more organizational/politics.

simonw 6 hours ago | parent [-]

In my own career I've only ever seen it increase the cost of development.

The vast majority of A/B test results I've seen showed no significant win in one direction or the other, in which case why did we just add six weeks of delay and twice the development work to the feature?

Usually it was because the Highest Paid Person insisted on an A/B test because they weren't confident enough to move on without that safety blanket.

There are other, much cheaper things you can do to de-risk a new feature. Build a quick prototype and run a usability test with 2-3 participants - you get more information for a fraction of the time and cost of an A/B test.

cdavid 3 hours ago | parent [-]

There are cases where A/B testing does not make sense (not enough users to measure anything sensible, etc.). But if the A/B test results were inconclusive, assuming they were done correctly, then what was the point of launching the underlying feature ?

As for the HIPPO pushing for an A/B test because of lack of confidence, all I can say is that we had very different experiences, because I've almost always seen the opposite, be it in marketing, search/recommendation, etc.

simonw 3 hours ago | parent [-]

"not enough users to measure anything sensible" is definitely a big part of it: even for large established companies there are still plenty of less than idler used features that don't have enough activity for that to make sense.

A former employer had developed a strong culture of A/B testing to the point that everyone felt pressure to apply it to every problem.

4 hours ago | parent | prev [-]
[deleted]