27 November 2016
Statistical significance has always been the “gold standard” for presenting compelling scientific evidence. But recently the social sciences and even medical testing have been going through somewhat of a crisis. Scientific facts were always considered “established” after publication of significant findings. We used to find those results “real” as distinguished from chance occurrences, or false positives.
In 2005 John Ioannidis wrote a damning paper for Biomedicine titled “Why Most Published Research Findings Are False.” A later study in Experimental Psychology attempted to recreate some established results, and their efforts returned a shocking 38% reproducibility. This triggered a methodological crisis that is by no means contained to Biomedicine or Psychology. Apart from outright fraudsters like Diederik Stapel who fabricated results, most of these findings have a different basis: poor research practices.
Part of the problem is that some people find statistics (too) hard, and decision theory is sometimes poorly understood. Even when findings are formulated with adequate caution and stating probabilities, it is mentally just easier to assume that a statistically significant finding means that the finding has been established. The conventional threshold for academic publications has always been a Type I (alpha) error level of 5%, which from a decision theory point of view might be too liberal, but in some cases too strict! So relying on that 5% threshold cannot possibly be sensible. Yet that number 5% has been commonly accepted as the yardstick for “significance”, making this a “real” as opposed to a “chance” effect.
Although there have been questions with regards to the suitability of Fisher statistics, as opposed to employing Bayesian statistics, the real problem does not lie with the choice of any given threshold per se. The root cause of the current scientific crisis lies in Academic Journals that will (typically) only publish significant (sic) findings, when the scientific method calls for repeated non-chance (statistically significant) findings to consider a fact “established.” Yet Journals will happily publish original research results, even when no attempt at replication has been made!
In the age of Big Data, where massive volumes of data are even more likely to lead to spurious “significant” findings, data scientists have an additional obligation to ensure that their findings are properly understood and interpreted. Statistical significance isn’t what it used to be anymore…