By Macartan Humphreys (Political Science, Columbia & EGAP)
I am sold on the idea of research registration. Two things convinced me.
First I have been teaching courses in which each week we try to replicate prominent results produced by political scientists and economists working on the political economy of development. I advise against doing this because it is very depressing. In many cases data is not available or results cannot be replicated even when it is. But even when results can be replicated, they often turn out to be extremely fragile. Look at them sideways and they fall over. The canon is a lot more delicate than it lets on.
Second I have tried out registration for myself. That was also depressing, this time because of what I learned about how I usually work. Before doing the real analysis on data from a big field experiment on development aid in Congo, we (Raul Sanchez de la Sierra, Peter van der Windt and I) wrote up a “mock report” using fake data on our outcome variables. Doing this forced us to make myriad decisions about how to do our analysis without the benefits of seeing how the analyses would play out. We did this partly for political reasons: a lot of people had a lot invested in this study and if they had different ideas about what constituted evidence, we wanted to know that upfront and not after the results came in. But what really surprised us was how hard it was to do it. I found that not having access to the results made it all the more obvious how much I am used to drawing on them when crafting analyses and writing; for simple decisions such as which exact measure to use for a given concept, which analyses to deepen, and which results to emphasize. More broadly that’s how our discipline works: the most important peer feedback we receive, from reviewers or in talks, generally comes after our main analyses are complete and after our peers are exposed to the patterns in the data. For some purposes that’s fine, but it is not hard to see how it could produce just the kind of fragility I was seeing in published work.
These experiences convinced me that our current system is flawed. Registration offers one possible solution.
There are a couple of different rationales for registration. One fairly unassailable one is that it is useful to have a census of the population of studies done in an area—whether published or not—in order to put individual findings in context. But I am more motivated by the idea that registration might improve the way we work and increase the credibility of individual studies by limiting the scope for fishing (see our paper on this here).
Many others in political science have also been warming to the idea. The group I know best that has been thinking about this is the EGAP network. EGAP is a network of 70 researchers and practitioners working on political economy using experimental approaches. Most work on governance in developing regions and others work on related themes in developed areas or on research methodologies. All members of the group have endorsed a statement of principles, one of which includes the registration idea (Principle 2). To date 18 projects have been registered on the EGAP website (see here); many of these provide considerable detail including complete code for the replication of results before any results are produced. The group also developed a proposal for introducing a more institutionalized registration system to work in tandem with journals.
The core of that proposal was to have a voluntary and nonbinding registration system. So one in which no one has to register and even if they do they have license to deviate from plan. The carrot is some form of recognition by journals that an article is “registration compliant”—by, for example, putting a badge on the front page of the article. In the original plan, registry staff would certify the extent to which ultimate analyses differed from proposed analyses though that part of the plan was later dropped for fear of introducing too evaluative a component. In the present proposal this certification is replaced with an expectation that authors describe deviations in their texts. Any reader can then verify as they see fit.
The basic idea of the proposal is in a way extremely toothless. Say what you plan to do before you do it—or don’t, if you don’t want to. Just be clear one way or the other.
Toothlessness notwithstanding, I have seen plenty of negative reactions to this. A lot of the opposition has focused on six counterarguments. The first three are not so much arguments against as arguments that registration is not necessary, a consideration that is salient since registration places a burden on researchers. The last three are more pointedly against.
The real problem is classical hypothesis testing and that’s what should be gotten rid of. The problem of data fishing is especially obvious for classical hypotheses testing, especially given its focus on arbitrary thresholds and the seemingly unbearable pressure to be significant. The extraordinary analysis by Gerber and Malhotra shows how the form that fishing takes is powerfully shaped by the structure of classical tests. But really anyone can fish. All you need is latitude in how you define your analyses. Bayesians can fish as well as the next person by selecting models that produce outcomes consistent with their theories or worldviews. So a mass conversion to Bayes won’t solve the problem, though it might take the pressure off.
The real problem is one of multiple comparisons and so people just need to improve their analyses. The fishing problem sometimes sounds like a multiple comparisons problem which could be fixed by using the right approaches. If researchers conduct ten tests of a single hypotheses and reject the null on just one then they should take account of the fact they have done ten analyses when conducting the tests. That is right as far as it goes. But it is still possible to make a mess of a multiple comparisons problem but not fish (by reporting all the results albeit with misleading p statistics) and it is also possible to engage seriously with the multiple comparisons problem but still fish (by not reporting the result of the multiple comparisons analysis if you don’t like it). So improvements in analysis techniques won’t solve the problem.
Registration and Replication are Substitutes. A number of people have argued that we should be focusing more on replication than registration. The idea is that truth will be found by the interaction of researchers robustly challenging and replicating each other’s work. In such an environment, fragile results will not survive. The worry is that we need dialogue to be vigorous but registration requirements might just slow it down. I don’t share the optimism that the multiplication of contradictions leads to the discovery of truth and I would much rather live in a world in which I felt that I could largely believe the abstract of a paper rather than have to suspend belief until I can get hold of the data and see where the bodies are buried. But even if I shared this optimism it looks to me like registration is more likely to act as a complement to replication than a substitute. Both are consistent with the broader goal of research transparency and posting plans for analysis likely makes the path to replication easier, at least to the extent that registration systems encourage detailed and public descriptions of method. In fact in some cases already, registration has resulted in replication structures being available even before the basic analysis is undertaken.
Registration will force people to implement analyses they know to be inappropriate. Some have argued that under registration norms researchers would be bound to follow analysis plans that they know to be suboptimal once they encounter the data. I think this argument is a bit of a red herring. Many analyses that have used analysis plans deviate from them in some way or other (and state when they do). What’s more, present proposals for registration do not require authors to stick rigidly to plans. From a commitment perspective there is not much bite to the EGAP proposal for example. But registration might still have a powerful communication function even if there is no formal commitment function (that’s the key point in our paper on this). It clarifies when analysis decisions have been informed by results and when not.
Registration will facilitate plagiarism. In principle it is possible that someone registers a research design and then someone else then takes the design and races to implement it first (the opposite argument can also be made, that registering a design can provide a mechanism for rights claiming by the person registering). For some sorts of work this is not a major risk, especially when the value added lies in the implementation more than the design. But even if stealing is a risk there is a simple solution which is to allow an option of keeping designs private until such time as they are published.
Registration will prevent exploration. Perhaps the most common concern is that registration will prevent exploratory research. A lot of what political scientists do is exploratory, and probably has to be, and registration, if done right, should not prevent that. The problem as I see it though is not that we do a lot of exploratory work but that we don’t admit to it when we present our results. Under a registration regime there would be at least five strategies that researcher could use if they wanted to do exploration in an area where priors or theory are very weak.
a. Declare that you want to do soak and poke exploration (by which I mean exploration that might be valuable but is not itself amenable to ex ante description) and signal that that is what you are doing by not registering. Or perhaps register a part of an analysis and draw a firewall between the registered and unregistered parts.
b. Do principled exploration and register the process used for knowledge discovery. There have been many developments in the fields of machine learning and (proper) datamining that can be used for this.
c. Declare that really you are interested in the estimation of various quantities, not tests of particular claims about quantities, and register that.
d. Register some weakly motivated hypotheses because you feel you have to register something concrete; not listen to what the data is trying to tell you when you see things working very differently.
e. Forgo exploratory analysis. The incentives to engage in exploration may go down if researchers can no longer claim that some extraordinary pattern vindicates a theory that just formed in their minds at the same time as the results came in. If the possibility of misrepresenting their research is what keeps people exploring then exploration might go down.
Of these five, d. and (possibly) e. are likely the worst outcomes; the other three all have clear benefits over the current approach. The key risk as I see it is that rather than becoming more transparent (by doing a or b) research just becomes more obviously silly (d) or more conservative (e).
Of these six arguments, the last one is probably the most important and I expect conversation will increasingly focus on this. But to be clear the real problem here is not whether exploration would be possible under registration norms but whether registration would weaken incentives to explore to the point of impeding genuine discovery. Or put another way, if incentivizing exploration required a mechanism that made it impossible to know when it was actually happening.
What does the future hold?
Let me close by going off the deep end and registering some predictions right here. These are things I expect to happen, mixed in with things I hope to see happen.
Registration will happen. My first prediction: I think it is inevitable that we will see some sort of move towards registration in political science.
Who will do it? I expect that experimental researchers will take the lead on this. But much of the core work in our discipline is observational, and much of it is qualitative. This work can suffer from the same problems of reporting and publication bias as experimental work. I hope any registration initiatives in political science will create a facility that can be used for both observational and experimental research.
Will there be bite? Present proposals have no bite in the sense that registration is not formally required by anyone. But I expect there will be bite if the effect of recognition for some meant that researchers would feel that their claims to be conducting formal tests will not be taken seriously if they don’t state those tests up front. That might be right and in my already pessimistic view that would not be a bad thing.
Handling nulls. I think it likely that when people start registering more we will start seeing more null results; perhaps reflecting more accurately how hard it is to make robust predictions about complex processes. Three possible effects of this are that (a) researchers will try to protect themselves from null results by engaging in fewer, larger studies (b) there will be greater focus on the ex ante motivation for empirical analysis, be it through a greater reliance on theory or greater engagement with cumulation, (c) in light of these two effects there may be more interest and tolerance for null results. More generally I think it likely that there will be a broader shift in the goals of analysis to move from testing to estimation.
Language shifts. I optimistically predict that registration norms will make us both more accepting of, and more demanding of, exploratory research and that this will be reflected in writing. There will be less pressure on (and less scope for) researchers to claim they are testing if they are not, but also more scrutiny of the ways that exploratory work is undertaken.
Development of methods for assessing claims in light of past knowledge. The benefits of registration are clearest for analyses on data that is not yet available at the time of registration. The problem with historical data is that we often already know lots about the historical patterns before we undertake any new analysis. The fishing has already happened. Given the importance of historical work in the discipline, we will have to develop different ways of distinguishing novelty from confirmation in the analysis of historical data (this could perhaps be done by developing a methods to assess “disciplinary priors” prior to any new analysis).
So there it is. Six predictions, most of them vague. A good chance I think that at least one of them will work out.