40,000 fMRI Studies Are Not Wrong

A common saying in the media used to be “If it bleeds, it leads.” Nowadays, people talk about click-bait headlines. The more salacious and tantalizing the headline, the more likely people will click on that article, driving up profit for web-based advertising. The newsworthiness or the veracity of the story are inconsequential. Unfortunately, click-bait language is now entering academia.

Recently, a paper was published in the Proceedings of the National Academy of Sciences called “Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates” by Anders Eklund, Thomas E. Nichols, and Hans Knutsson. The paper uses resting state fMRI data to estimate the familywise error rate of group statistical analyses when using a cluster threshold to control for multiple comparisons. When expecting 5% false positives, the study instead found much higher false positive rates, including rates up to 70%. The authors state that the “results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.”

This is an example of click-bait language entering academic papers. This is a foolish (and unprofessional) claim to make. To make a claim like this, they would need to comb through hundreds of fMRI papers, determining how many studies use cluster thresholds and large, voxelwise p-values. But the authors do not do this. Nevertheless, this paper caused a firestorm in the popular science journalism online, making headlines such as these:

Bug in fMRI software calls 15 years of research into question (Wired)

A bug in fMRI software could invalidate 15 years of brain research (Science alert)

Tens of thousands of FMRI brain studies may be flawed (Forbes)

Software faults raise questions about the validity of brain studies (Ars Technica)

15 years of brain research has been invalidated by a software bug, says Swedish scientists (International Business Times)

When big data is bad data (ZDNet)

Thousands of fMRI brain studies in doubt due to software flaws (New Scientist)

 

These headlines are disappointing for many reasons. First, it was not a software bug that was discovered. Instead, the study found incorrect assumptions about fMRI data that led to a higher than expected rate of false positives. Second, the paper did not find that big data is bad data. Instead, it used big data to find the problem. This is an example of big data helping science.

 

In reality, the paper reports some very important findings. The study found that there are two incorrect assumptions that lead to inflated significance of statistical analyses when cluster thresholds are used. One is that the imaging data has non-Gaussian spatial autocorrelation functions. Spatial autocorrelation is a fancy way of saying that neighboring voxels (or 3D pixels) have similarities in their time course signals. It has previously been assumed that this similarity could be explained by a simple Gaussian model. The Eklund study showed that this is wrong. Two, the study found that the spatial smoothness in the images is not constant throughout the brain. Instead, the spatial smoothness varies.

 

Both of these findings are important and should be taken into account when applying the cluster threshold. Eklund et al. suggest the use of non-parametric statistical methods, meaning using bootstrap methods to determine the statistical significance of clusters. This is one good solution. However, other solutions exist as well. Researchers could choose to not use cluster threshold and instead rely on false discovery rate (FDR) thresholding, which fMRI researchers have been using since at least 2002. It may also be possible to use smaller cluster determining thresholds (also called voxelwise p-values).

 

The authors make one more very important point. They state that “the fMRI community should, in our opinion, focus on validation of existing methods.” This is most definitely true. Unfortunately, researchers are generally more motivated to develop new methods rather than refine old methods. New ideas grab grants and publications in leading journals. There is not much personal incentive for scientists to validate methods and repeat studies. However, the neuroimaging community would be much better served if existing methods could be validated, especially as there are many potential clinical uses of fMRI on the horizon.

 

One of the authors of the PNAS paper, Thomas Nichols, recently wrote a blog where he recalculated the number of fMRI papers that could be affected by the cluster threshold problem. He argued that it is more likely that 3,500 fMRI papers are affected instead of 40,000. An errata has been written by the authors and accepted by the journal. There is most likely a certain number of papers that use faulty statistical methodology, yielding inflated significance of activation. However, the number 3,500 is a rough estimation, not a true figure.

 

So, no, 40,000 fMRI papers are not wrong. And yet, the flurry of online articles making this hyperbolic claim have already been written and absorbed into the collective consciousness of the science-following public. I doubt we will see as many science journalists writing articles about the correction. Errata do not make for click-bait headlines.