Bayes’ Rule and the Paradox of Pre-Registration of RCTs

By Donald P. Green (Political Science, Columbia)

Not long ago, I attended a talk at which the presenter described the results of a large, well-crafted experiment. His results indicated that the average treatment effect was close to zero, with a small standard error. Later in the talk, however, the speaker revealed that when he partitioned the data into subgroups (men and women), the findings became “more interesting.” Evidently, the treatment interacts significantly with gender. The treatment has positive effects on men and negative effects on women.

A bit skeptical, I raised my hand to ask whether this treatment-by-covariate interaction had been anticipated by a planning document prior to the launch of the experiment. The author said that it had. The reported interaction now seemed quite convincing. Impressed both by the results and the prescient planning document, I exclaimed “Really?” The author replied, “No, not really.” The audience chuckled, and the speaker moved on. The reported interaction again struck me as rather unconvincing.

Why did the credibility of this experimental finding hinge on pre-registration? Let’s take a step back and use Bayes’ Rule to analyze the process by which prior beliefs were updated in light of new evidence. In order to keep the algebra to a bare minimum, consider a stylized example that makes use of Bayes’ Rule in its simplest form.

Let’s start by supposing that the presenter was in fact following a planning document that spelled out the interaction effect in advance. My hypothesis (H) is that this interaction effect is substantively small (i.e., close to zero). Before attending the talk, my prior belief was that there is a 50% chance that this hypothesis is true. Formally, my prior may be expressed as Pr(H) = 0.5. Next, I encounter evidence (E) that the presenter’s experiment revealed a statistically significant interaction. Suppose there is a 5% probability of obtaining a statistically significant effect given that H is true, which is to say that Pr(E|H) = 0.05. In order to apply Bayes’ Rule, we need one more quantity: the probability of observing a significant result given that H is false (denoted ~H). For a well-powered study such as this one, we may suppose that Pr(E|~H) = 1. In other words, if there were truly a substantively large effect, this study will find it.

Plugging these inputs into Bayes’ Rule allows us to calculate the posterior probability, Pr(H|E), which indicates my degree of belief in H after seeing evidence of a statistically significant finding:

Screen Shot 2013-01-24 at 2.35.56 PM

Before seeing the experimental evidence, I thought there was a 0.50 probability of H; now, I accord H a probability of just 0.048. Having seen the presenter’s evidence of a statistically significant effect, my beliefs have changed considerably.

What if the presenter obtained this result by fishing for a statistically significant estimate? I don’t know whether the presenter fished, but I do know that fishing is possible because the analysis was not guided by a planning document. Given the possibility of fishing, I re-evaluate the probability of observing a significant result even when there is a negligible effect. Above, we supposed that Pr(E|H) = 0.05; now, let’s assume that there is a 75% chance of obtaining a significant result via fishing: Pr(E|H) = 0.75. In that case,

Screen Shot 2013-01-24 at 2.35.21 PM

Having seen the experimental evidence, I take the probability of H to be 0.429, which is not very different from my prior belief. In other words, when my evaluation of the evidence takes fishing into account, my priors are less influenced by the presenter’s evidence.

The broader point is that the mere possibility of fishing can undercut the persuasiveness of experimental results. When I’m confident that the researcher’s procedures are sound, Pr(E|H) is quite different from Pr(E|~H), and the experimental finding really tells me something. When I suspect fishing, Pr(E|H) moves closer to 1, and the experimental findings become less persuasive. (In an extreme case where Pr(E|H) = Pr(E|~H) = 1, the experimental findings would not change my priors about H at all.)

This application of Bayes’ Rule suggests that planned comparisons may substantially increase the credibility of experimental results. The paradox is that journal reviewers and editors do not seem to accord much weight to planning documents. On the contrary, they often ask for precisely the sort of post hoc subgroup analyses that creates uncertainty about fishing.

The bottom line: if we want to make the case for pre-registration, proponents of experimental research must point out that the nominal results of an experiment are robbed of their persuasive value if readers suspect that the findings were obtained through fishing. In the short run, that means finding fault with existing practice – the lack of planning documents – so that we can improve the credibility of experimental results in the long run.

Don Green backyard

About the author:
Donald P. Green is Professor of Political Science at Columbia University. The author of four books and more than one hundred essays, Green’s research interests span a wide array of topics: voting behavior, partisanship, campaign finance, hate crime, and research methods. Much of his current work uses field experimentation to study the ways in which political campaigns mobilize and persuade voters. He recently co-authored Field Experiments: Design, Analysis, and Interpretation (W.W. Norton Press, 2012).

This post is one of a ten-part series in which we ask researchers and experts to discuss transparency in empirical social science research across disciplines. It was initially published on CEGA blog on March 20, 2013. You can find the complete list of posts here