By Donald P. Green (Political Science, Columbia)
Not long ago, I attended a talk at which the presenter described the results of a large, wellcrafted experiment. His results indicated that the average treatment effect was close to zero, with a small standard error. Later in the talk, however, the speaker revealed that when he partitioned the data into subgroups (men and women), the findings became â€śmore interesting.â€ť Evidently, the treatment interacts significantly with gender. The treatment has positive effects on men and negative effects on women.
A bit skeptical, I raised my hand to ask whether this treatmentbycovariate interaction had been anticipated by a planning document prior to the launch of the experiment. The author said that it had. The reported interaction now seemed quite convincing. Impressed both by the results and the prescient planning document, I exclaimed â€śReally?â€ť The author replied, â€śNo, not really.â€ť The audience chuckled, and the speaker moved on. The reported interaction again struck me as rather unconvincing.
Why did the credibility of this experimental finding hinge on preregistration? Letâ€™s take a step back and use Bayesâ€™ Rule to analyze the process by which prior beliefs were updated in light of new evidence. In order to keep the algebra to a bare minimum, consider a stylized example that makes use of Bayesâ€™ Rule in its simplest form.
Letâ€™s start by supposing that the presenter was in fact following a planning document that spelled out the interaction effect in advance. My hypothesis (H) is that this interaction effect is substantively small (i.e., close to zero). Before attending the talk, my prior belief was that there is a 50% chance that this hypothesis is true. Formally, my prior may be expressed as Pr(H) = 0.5. Next, I encounter evidence (E) that the presenterâ€™s experiment revealed a statistically significant interaction. Suppose there is a 5% probability of obtaining a statistically significant effect given that H is true, which is to say that Pr(EH) = 0.05. In order to apply Bayesâ€™ Rule, we need one more quantity: the probability of observing a significant result given that H is false (denoted ~H). For a wellpowered study such as this one, we may suppose that Pr(E~H) = 1. In other words, if there were truly a substantively large effect, this study will find it.
Plugging these inputs into Bayesâ€™ Rule allows us to calculate the posterior probability, Pr(HE), which indicates my degree of belief in H after seeing evidence of a statistically significant finding:
Before seeing the experimental evidence, I thought there was a 0.50 probability of H; now, I accord H a probability of just 0.048. Having seen the presenterâ€™s evidence of a statistically significant effect, my beliefs have changed considerably.
What if the presenter obtained this result by fishing for a statistically significant estimate? I donâ€™t know whether the presenter fished, but I do know that fishing is possible because the analysis was not guided by a planning document. Given the possibility of fishing, I reevaluate the probability of observing a significant result even when there is a negligible effect. Above, we supposed that Pr(EH) = 0.05; now, letâ€™s assume that there is a 75% chance of obtaining a significant result via fishing: Pr(EH) = 0.75. In that case,
Having seen the experimental evidence, I take the probability of H to be 0.429, which is not very different from my prior belief. In other words, when my evaluation of the evidence takes fishing into account, my priors are less influenced by the presenterâ€™s evidence.
The broader point is that the mere possibility of fishing can undercut the persuasiveness of experimental results. When Iâ€™m confident that the researcherâ€™s procedures are sound, Pr(EH) is quite different from Pr(E~H), and the experimental finding really tells me something. When I suspect fishing, Pr(EH) moves closer to 1, and the experimental findings become less persuasive. (In an extreme case where Pr(EH) = Pr(E~H) = 1, the experimental findings would not change my priors about H at all.)
This application of Bayesâ€™ Rule suggests that planned comparisons may substantially increase the credibility of experimental results. The paradox is that journal reviewers and editors do not seem to accord much weight to planning documents. On the contrary, they often ask for precisely the sort of post hoc subgroup analyses that creates uncertainty about fishing.
The bottom line: if we want to make the case for preregistration, proponents of experimental research must point out that the nominal results of an experiment are robbed of their persuasive value if readers suspect that the findings were obtained through fishing. In the short run, that means finding fault with existing practice â€“ the lack of planning documents â€“ so that we can improve the credibility of experimental results in the long run.
