The Need for Pre-Analysis: First Things First

By Richard Sedlmayr (Philanthropic Advisor)

When we picture a desperate student running endless tests on his dataset until some feeble point finally meets statistical reporting conventions, we are quick to dismiss the results. But the underlying issue is ubiquitous: it is hard to analyze data without getting caught in a hypothesis drift, and if you do not seriously consider the repercussions on statistical inference, you too are susceptible to picking up spurious correlations. This is also true for randomized trials that otherwise go to great lengths to ensure clean causal attribution. But experimental (and other prospective) research has a trick up its sleeve: the pre-analysis plan (PAP) can credibly overcome the problem by spelling out subgroups, statistical specifications, and virtually every other detail of the analysis before the data is in. This way, it can clearly establish that tests are not a function of outcomes – in other words, that results are what they are.

So should PAPs become the new reality for experimental research? Not so fast, say some, because there are costs involved. Obviously, it takes a lot of time and effort to define the meaningful analysis of a dataset that isn’t even in yet. But more importantly, there is a risk that following a PAP backfires and actually reduces the value we get out of research: perhaps one reason why hypothesis drift is so widespread because it is a cost-effective way of learning, and by tying our hands, we might stifle the valuable processes can only take place once data is in. Clearly, powerful insights that came out of experimental work – both in social and biomedical research – have been serendipitous. So are we stuck in limbo, “without a theory of learning” that might provide some guidance on PAPs?

I argue that much of the discussion about the merits and dangers of PAPs is beside the point. The problem is not that the research process is inherently flawed because data must be analyzed before it exists; the problem is that researchers got used to presenting exploratory analyses as hypothesis-driven when they aren’t. One straightforward solution is simply to be transparent about what is what, and honestly document research: if we it can be understood, there is no need to regulate it. So before discussing the need for hundred-page PAPs, I believe we should focus our attention on making public what we are already used to producing: proposals, data, code, et cetera. Simply getting access to these would allow people to get a fairly good sense of the degree of exploration that has been undertaken (and expose a lot of other useful things, like errors). Sure, the distinction between chance and discovery might not become crystal clear – but that’s ok, because our understanding of how useful knowledge is generated is equally murky. And over time, a change in transparency norms will likely lead researchers to put more thought into documenting their hypotheses, simply out of reputational concerns.

So I believe that collective action on PAPs isn’t required, but that it is time to get our act together on transparency – a classic public goods problem, and one that regulation can help solve. A transparency policy for a funder could look something like this. But if you are a researcher, don’t wait: the tools to document and share the full details of your research, warts and all, already exist.


About the author:
Richard Sedlmayr is a philanthropic advisor based in New York City.

This post is one of a ten-part series in which we ask researchers and experts to discuss transparency in empirical social science research across disciplines. It was initially published on CEGA blog on March 20, 2013. You can find the complete list of posts here