Can you elaborate on the difference between your statement and the author's?

sweezyjeezy 5 hours ago | parent | next [-]

This is a subtle point that even a lot of scientists don't understand. A p value or < 0.05 doesn't mean "there is less than a 5% chance the treatment is not effective". It means that "if the treatment was only as effective, (or worse) than the original, we'd have < 5% chance of seeing results this good". Note that in the second case we're making a weaker statement - it doesn't directly say anything about the particular experiment we ran and whether it was right or wrong with any probability, only about how extreme the final result was.

Consider this example - we don't change the treatment at all, we just update its name. We split into two groups and run the same treatment on both, but under one of the two names at random. We get a p value of 0.2 that the new one is better. Is it reasonable to say that there's a >= 80% chance it really was better, knowing that it was literally the same treatment?

▲

datastoat 5 hours ago | parent | prev | next [-]

Author: "5% chance of shipping something that only looked good by chance". One philosophy of statistics says that the product either is better or isn't better, and that it's meaningless to attach a probability to facts, which the author seems to be doing with the phrase "5% chance of shipping something".

Parent: "5% chance of looking as good as it did, if it were truly no better than the alternative." This accepts the premise that the product quality is a fact, and only uses probability to describe the (noisy / probabilistic) measurements, i.e. "5% chance of looking as good".

Parent is right to pick up on this, if we're talking about a single product (or, in medicine, if we're talking about a single study evaluating a new treatment). But if we're talking about a workflow for evaluating many products, and we're prepared to consider a probability model that says some products are better than the alternative and others aren't, then the author's version is reasonable.

	▲	pkhuong 4 hours ago \| parent \| next [-]
		One easy slip-up with discussing p values in the context of a workflow or a decision-making process is that a process with p < 0.05 doesn't give us any bound on the actual ratio of actually good VS lucky changes. If we only consider good changes, the fraction of false positive changes is 0%; if we only consider bad changes, that fraction is 100%. Hypothesis testing is no replacement for insight or taste.
	▲	kgwgk 3 hours ago \| parent \| prev \| next [-]
		> But if we're talking about a workflow for evaluating many products, and we're prepared to consider a probability model that says some products are better than the alternative and others aren't, then the author's version is reasonable. It’s not reasonable unless there is a real difference between those “many products” which is large enough to be sure that it would rarely be missed. That’s a quite strong assumption.
	▲	4 hours ago \| parent \| prev [-]
		[deleted]

▲

kgwgk 4 hours ago | parent | prev | next [-]

There are a few good explanations already (also less good and very bad) so I give a simple example:

You throw a coin five times and I predict the result correctly each time.

#1 You say that I have precognition powers, because the probability that I don’t is less than 5%

#2 You say that I have precognition powers, because if I didn’t the probability that I would have got the outcomes right is less than 5%

#2 is a bad logical conclusion but it’s based on the right interpretation (while #1 is completely wrong): it’s more likely that I was lucky because precognition is very implausible to start with.

▲

drc500free 4 hours ago | parent | prev | next [-]

The wrong statement is saying P(no real effect) < 5%

The correct statement is saying P(saw these results | no real effect) < 5%

Consider two extremes, for the same 5% threshold:

1) All of their ideas for experiments are idiotic. Every single experiment is for something that simply would never work in real life. 5% of those experiments pass the threshold and 0% of them are valid ideas.

2) All of their ideas are brilliant. Every single experiment is for something that is a perfect way to capture user needs and get them to pay more money. 100% of those experiments pass the threshold and 100% of them are valid ideas.

(P scores don't actually tell you how many VALID experiments will fail, so let's just say they all pass).

This is so incredibly common in forensics that it's called the "prosecutor's fallacy."

▲

ghkbrew 5 hours ago | parent | prev | next [-]

The chance that a positive result is a false positive depends on the false positive rate of your test and on total population statistics.

E.g. imagine your test has a 5% false positive rate for a disease only 1 in 1 million people has. If you test 1 million people you expect 50,000 false positive and 1 true positive. So the chance that one of those positive results is a false positive is 50,000/50,001, not 5/100.

Using a p-value threshold of 0.05 similar to saying: I'm going to use a test that will call a false result positive 5% of the time.

The author said: chance that a positive result is a false positive == the false positive rate.

▲

leoff 2 hours ago | parent | prev | next [-]

wrong: given that we got this result, what's the probability the null hypothesis is correct?

correct: given that the null hypothesis is correct, what's the probability of us getting this result or more extreme ones by chance?

from Bayes you know that P(A|B) and P(B|A) are 2 different things

▲

likecarter 6 hours ago | parent | prev [-]

Author: 5% chance it could be same or worse

Parent: 5% chance it could be same

▲

esafak 5 hours ago | parent [-]

@wavemode: In other words, the probability of it being the exactly the same is typically (for continuous random variables) zero, so we consider the tail probability; that of it being the same or more extreme.

edit: Will the down voter please explain yourself? p-values are tail probabilities, and points have zero measure in continuous random variables.

	▲	4 hours ago \| parent [-]
		[deleted]