Warning! This is an extremely boring post about statistics, and is mostly for my own use. Feel free to ignore this post!
As I continue to study statistics, I'd like to write out and better internalize a few reminders about one of the most ubiquitous features of a statistical test: the p-value (also written as P-value or
p or P). Every time I think I finally have this concept nailed I realize I forgot a detail somewhere.
The p-value is the probability of getting a more extreme result if we assume that the null hypothesis is true. But for the p-value to be of any use, a somewhat arbitrary cut-off level needs to be established before the test, usually at 0.05.
For example, 100 rats are injected with a drug. For the injected rats, mean response time to a stimulus is 1.2 seconds, with a standard deviation of 0.5 seconds (and the distribution is normal). Rats not injected have a mean response time of 1.05 seconds. The null hypotheses in this case would be that the drug has no effect. So what we want to know is how many standard deviations away from 1.05 is 1.2 in this test distribution? Simply put, what is the Z-score? Well, 1.2-1.05 equals .15, and since we want this difference in terms of standard deviations we divide by 0.5 (which was given above). This equals 3. From the 68/95/99.7 rule we know that a result that is 3 standard deviations away from the Ho mean has a probability of (1-0.997) = 0.003 = p-value.
OK, I wrote all that to say what this p-value of 0.003 means (and what it doesn't mean). Since we decided beforehand that an acceptable alpha (risk of rejecting the null hypothesis when in reality it's true) is 0.05, we can reject the null hypothesis. This doesn't mean that the alternative hypothesis is necessarily true! It just means that the effect found in this sample for this particular experiment probably did not happen by chance alone. That's all.
Another important thing to remember is that a low p-value alone doesn't indicate the strength of a correlation. For example say that I did two tests to test the effectiveness of two drugs, and the drug A test had a p-value of 0.001 and that drug B had a p-value of .04. I can't say that drug A is more effective, since maybe the drug B test had a larger sample size. P-values are can change suddenly when the sample size is small - for example, flipping a coin 20 times and getting 14 heads is statistically insignificant, but throwing 15 heads gives a p-value of less than 0.05.
You can't compare p-values across experiments. It's an indicator that is proper to that experiment alone.
Deciding which alpha level to use and how to interpret the p-value is dependant on your a priori knowlege of the subject, and your confidence in the efficacy of your test (is the sample good, are all the important variables accounted for, etc.) P-values are not magic numbers that prove hypotheses!