June 5, 2016

Rethinking the value of the P

by Heather Falconer

In scientific disciplines, researchers tend to view the world through what’s referred to as “an objectivist” lens, seeing “social phenomena and their meanings [as having] an existence that is independent of social actors” (Bryman, 2004). As such, they are likely to believe that a discoverable truth exists that can be found “by inquiry carried out in a thorough and determined way” (e.g., process of elimination and/or the manipulation of variables) so as “to discover how things really are and really work” (Lincoln & Guba, 2013). In other words, we can posit an idea (i.e., present a hypothesis) and the answer can be found by any scientist following an appropriate methodology and using correct procedures.

The nature of this view can logically lead to a research methodology that is experimental, with techniques for parsing large amounts of data. It also emphasizes reproducibility and replicability.

In analyzing their data, scientific researchers often draw on statistical tools and approaches to verify and qualify what they have collected – you may have heard, for example, of ANOVA, chi-square distributions, or t-tests. Recently, one of the most prevalent statistical tests to determine what is “true” or not – the P-value – became the focus of an American Statistical Association statement. According to an article in the June 2 issue of Science, “The ASA saw misunderstanding and misuse of statistical significance as a factor in the rise in concern about the credibility of many scientific claims (sometimes called the ‘reproducibility crisis’).” (Though scientific research projects are designed to be replicable and reproducible, in reality they rarely are. You can read more about the “crisis” here.)

According to ASA, P-value refers to “the probability of an observed data summary (e.g., an average) and its more extreme values, given a specific mathematical model and hypothesis (including the ‘null’).” What it is not is a measure of the credibility of the conclusions drawn from the observed data. “Over time it appears the p-value has become a gatekeeper for whether work is publishable, at least in some fields,” said Jessica Utts, ASA president. “This apparent editorial bias leads to the ‘file-drawer effect,’ in which research with statistically significant outcomes are much more likely to get published, while other work that might well be just as important scientifically is never seen in print. It also leads to practices called by such names as ‘p-hacking’ and ‘data dredging’ that emphasize the search for small p-values over other statistical and scientific reasoning.”

The ASA’s concern over the misuse of P-values and its potential corrupting effect of the data, leading to false or overstated claims, led the association to publish a statement outlining the following six principles on the use of the P-value:

  1. P-values can indicate how incompatible the data are with a specified statistical model.
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
  4. Proper inference requires full reporting and transparency.
  5. A p-value, or statistical significance, does not measure the size of an effect of the importance of a result.
  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

Click here to read the statement in full, including descriptions associated with each principle.

With P-value being one of the first and most prevalent statistical tests being taught to students in science, the ASA’s statement comes with a much-needed warning.

What are your experiences with statistical significance and P-values? What have you been taught about its meaning in scientific research and reporting? Weigh in in the comments below.

________________

Interested learning more about statistics? Read our modules on Descriptive Statistics,  Inferential Statistics, and Statistics in Science!

Heather Falconer

Written by

Heather Falconer holds undergraduate degrees in Graphic Arts and Environmental Science, as well as an MFA in Writing and an MLitt in Literature. She is currently completing her PhD in Rhetoric and Composition, with an emphasis on rhetoric in/and/of science. Heather has worked internationally in academic publishing as both an author and editor, and has taught a wide range of topics – from research writing to marine biology – in the public and private educational sectors.