Star Wars: The Empirics Strike Back

Guest post by Abel Brodeur, who is joining the economics department at the University of Ottawa

As a visiting Ph.D. student in the economics department at UC Berkeley (2013-2014), I was very fortunate to interact with many professors and Ph.D. students working on research transparency. I realized that several leading researchers in the Bay Area were deeply interested in problems arising from data mining and specification searching. The encouragements and support I received during my stay helped me a great deal in developing a research agenda and disseminating the results of my research.

In a study forthcoming in the American Economic Journal: Applied Economics, I document whether there is a strong behavioral bias in empirical research. This paper which is joint work with other economists who studied at Paris School of Economics (M. Le, M. Sangnier, Y. Zylberberg), shows that economists respond to statistical thresholds and search for specifications in order to get published in top economic journals.

A standard empirical research process often consists of hypothesis testing. Consider a researcher willing to test a hypothesis. Typically, the researcher will collect data, examine the empirical correlation in his sample and extract one important statistic, the p-value. If the p-value is above 5%, the researcher will not be able to reject the null, and his research will probably not get published. If, instead, the p-value is below 5%, his/her research will have substantially more chance of being published.

Such selection on p-values distorts researchers’ incentives. If, during the analysis, the researcher is first confronted with a p-value of 13%, he/she may be tempted to try variations of his/her initial analysis. For example, he/she may analyze only part of his data or adopt different specifications, until he/she gets the desired result and reaches a value below 5%.

This behavior has been referred to as data fishing, data dredging or p-hacking. Importantly, if all researchers behave like this, there would not only be too many p-values below 5% in academic journals, but there would also be far too few p-values just above the 5% threshold.

This study finds strong empirical evidence for this shortage of just-insignificant p-values. We collect all the p-values published between 2005 and 2011 in three of the most prestigious journals in economics (American Economic Review, Journal of Political Economy and Quarterly Journal of Economics) and show a strong empirical regularity: the distribution of p-values has a two-humped camel shape with a first hump for high p-values, missing p-values between 25% and 10%, and a second hump for p-values slightly below 5% (see figure below).

There are misallocated p-values (20% of the p-values are missing – roughly the size of the valley between the two humps) that should have been between 25 and 10%, and that can be retrieved below 5%.

We relate this misallocation to authors’ and papers’ characteristics and find that the presence of a misallocation correlates with incentives to get published: the misallocation is lower for older and tenured professors compared with younger researchers. The misallocation also correlates with the importance of the empirical result in the publication prospects. In theoretical papers, the empirical analysis is less crucial, and, indeed, the misallocation is much lower. Moreover, the two-humped camel shape is less visible in articles using data from randomized control trials or laboratory experiments.

The external validity of our findings is unclear. Our analysis is restricted to three top economic journals. In these journals, the rejection rates are high and the returns to publication are much higher than in other journals. Some researchers with negative results may send their papers to less prestigious journals, and the distribution of tests in the universe of journals maybe less biased than in our distribution. Negative results would then benefit from less impact but would still contribute to the literature. Moreover, as opposed to pharmaceutical trials, incentives for data mining are essentially private in economics (career concerns), and our findings may not translate to other disciplines
“Star Wars: The Empirics Strike Back” (A. Brodeur, M. Le, M. Sangnier, Y. Zylberberg), Forthcoming: American Economic Journal: Applied Economics

Contact: Abel Brodeur, University of Ottawa, abrodeur@uottawa.ca.

Share this: