Tuesday, 26 November 2013 00:00

What is Significant? Reevaluating the P-Value

Written by 
Rate this item
(3 votes)

Valen JohnsonValen Johnson, a statistics professor at Texas
A&M University, just published an analysis that
indicates a P-value of 0.05 is not so convincing.
Photo Credit: Texas A&M University
You finished all your replicates, your data are entered into your favorite statistical software, and you've got your fingers crossed that the test reveals a P-value of less than 0.05. It reads 0.039 and you breathe a sigh of relief. Without that P-value, you would have been stuck with your null hypothesis—that terrible possibility that your observed effect was meaningless. Instead, with the P-value on your side, you're finally ready to publish a significant observation. That is, unless you show it to Valen Johnson, a statistics professor at Texas A&M University, who has just published an analysis in PNAS1 that indicates your data are not so convincing.

Johnson began having second thoughts about the P-value while he was working as a statistician for clinical trials a few years ago. "Just looking at how hypothesis testing was being done, and how many drugs were passing through the 0.05 filter, it became apparent to me that there was a problem," Johnson recalled.

"I don't think the 0.05 value was ever rigorously tested or derived," Johnson said. He said the value arose in the early 1920s or 1930s when Ronald Fisher, a statistics pioneer, was doing some classical testing and arbitrarily decided he would regard a finding with a test statistic of 0.05 or less as significant. He then wrote Statistical Methods for Research Workers in which he proposed the 0.05 value. Johnson said, "Biologists have since made the mistake of interpreting the P-value as the probability that the null hypothesis is true."

If the P-value were based on the another hypothesis test, known as the Bayesian hypothesis test, then the number would represent the probability that the null hypothesis is true, Johnson said, but the P-value scientists generally use is based on the classical hypothesis. So Johnson modified the Bayesian procedures so that the rejection regions of the Bayesian approach match those in the classical hypothesis, so he could compare between the two.

Johnson's analysis shows that with a P-value of 0.05 there's a 17-25% chance that the null hypothesis is true. That's a probability he says is too high to reject a null hypothesis. Johnson recommends that a P-value of 0.005 be used for significant results, which implies a 2-4% chance that the null hypothesis is true. To declare highly significant findings, a P-value of 0.001 would imply that there was less than a 1% chance that the null hypothesis is true.

Johnson says the P-value problem is a primary reason for retractions and irreproducibility. "I think the situation is pretty serious and it's gone undetected so long because most of the experiments conducted in the biological sciences are never replicated," he said. "Journal editors should require a 0.005 P-value for publication and consumers of these statistics should realize that if a finding is based on a P-value of 0.05 that it is probably a false finding."

1Johnson VE (2013). Revised standards for statistical evidence. Proc Natl Acad Sci USA. [Epub ahead of print].

Johnson's research is supported by National Cancer Institute Award R01 CA158113.

Christina Szalinski

Christina is a science writer for the American Society for Cell Biology. She earned her Ph.D. in Cell Biology and Molecular Physiology at the University of Pittsburgh.

Email This email address is being protected from spambots. You need JavaScript enabled to view it.