Interpretation of study results (Part 1/2): One reproducibility crisis may hide another

Guest post by Arnaud Vaganay (Meta-Lab)


This post is the first of two dedicated to the reproducible interpretation of empirical results in the social sciences.

If you are a regular reader of this blog, chances are high that you know all about the ‘reproducibility crisis’ that has struck many fields of science over the past few years. In my experience, there is still a lot of confusion as to what it really means. In its narrowest sense, the reproducibility crisis refers to the inability, or great difficulty, that many researchers face when they attempt to reproduce a graph, a table or a single statistic using the same data and code as in the original study. A broader definition of irreproducibility refers to the difficulty in reproducing results using the same data as in the original study but the researcher’s own understanding of the analysis rather than the exact code. Regardless of the definition (broad or narrow), we should call this type of reproducibility “analytical/analytic reproducibility”, as suggested by LeBel et al.

Although the analysis of causal mechanisms, correlations and trends is an essential part of research, it is only the first of two analyses that social scientists are expected to deliver. The second analysis involves the interpretation of these results in the light of existing theories and results from previous studies. This is typically done in the ‘discussion’ section of the manuscript. I think it is safe to say that the latter type of analysis does not get nearly as much attention – from investigators, peer reviewers and readers – as the empirical analysis. As a result, discussion sections are often erratic and almost never reproducible. Rather than analysing the robustness of a theory or result based on clear criteria, discussions tend to be used to justify the new findings based on whatever study will support the authors’ claims. Such an approach is problematic for a number of  reasons. Not only is it prone to confirmation bias, it also disregards the cumulative nature of science, i.e. the fact that “no single experiment, however significant in itself, can suffice for the experimental demonstration of any (natural) phenomenon” (words attributed to Ronald Fisher). In addition, it fails to properly manage the expectations of policymakers, beneficiaries and the media by neglecting to put these results in context. These groups need to understand that what worked ‘here’ may or may not work ‘there’ and if it doesn’t, then the next logical question is why.

Assuming that each study involves a mix of replication and innovation, a useful discussion is thus one that:

  • Compares and contrasts the new results with results from previous studies, bearing in mind that the closer the replication, the stronger the expectation to find a similar result; and
  • Assesses the plausibility that any major discrepancy is due to the specificity (intended or unintended) of the intervention, context or analysis, rather than to errors or biases. This is the ‘innovative’ component of the study.

In line with the fundamental norms of science, this comparison should be transparent and systematic. As the methodology of interpreting phenomena is called hermeneutics, it is appropriate to talk about ‘hermeneutic reproducibility’ to define the extent to which a researcher agrees with the interpretation made by another researcher. To be fair, the term was suggested by Victoria Stodden in a discussion about the different types of reproducibility.

There are a few more more reasons why we should care about hermeneutic reproducibility. First, without a systematic comparison, researchers are left to discuss the meaning of their results in terms of direction (positive/negative) and statistical significance (significant/insignificant at a certain level). These measures can be helpful but they are also crude and decontextualized. Second, without clear decision rules, researchers discussing the meaning of their results can easily be victims of interpretive bias, by failing to identify relevant previous studies, failing to compare the same quantities in relevant studies or giving a different meaning to the same result. For example, two different visual presentations of the same result can lead to different interpretations. Ultimately, researchers prone to interpretive bias are likely to give greater weight to their prefered outcome. Last but not least, writing a reproducible discussion is a task that can be taught, delegated, quality controlled and improved.

My next blog post will provide some practical recommendations to enhance the hermeneutic reproducibility of empirical research. Stay tuned and feel free to send me some feedback at arnaud@meta-lab.co!


About the author: Arnaud Vaganay is the founder and Director of Meta-Lab, as well as a BITSS Catalyst.

One thought on “Interpretation of study results (Part 1/2): One reproducibility crisis may hide another

  1. This blog brings to mind the use of Bayesian tools to build the context into the interpretation of the data. Per the Bayesian mode of thinking, the data do not have meaning all by themselves but must be interpreted in light of the context, as summarized by the prior distribution and the loss function. It is entirely possible for the same data set to point in different directions when analyzed using different priors or different loss functions.

    The priors may be summaries of other studies or they may be mostly the received wisdom or mostly the political point of view of the analyst’s intellectual clan.

    There is nothing wrong with using priors. What’s wrong is not knowing how your prior can properly be deployed and not revealing to the rest of us how you have allowed your prior to sneak into your analysis and confirm your beliefs.

    What prior, what loss function and what data would make one think that the latest increase in the Federal Funds rate by the Federal Reserve mattered??

    And by the way, the prior should come first, not something discovered with the help of the data, since otherwise the data analysis is only a means of eliciting the prior. Or maybe that is what we are all really doing with the data. It’s not science. It’s only an experience that helps us think seriously and creatively about the issue. If the data say that, there must be a reason.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.