Guest post by Olivia D’Aoust, Ph.D. in Economics from Université libre de Bruxelles, and former Fulbright Visiting Ph.D. student at the University of California, Berkeley.
As a Fulbright PhD student in development economics from Brussels, my experience this past year on the Berkeley campus has been eye opening. In particular, I discovered a new movement toward improving the standards of openness and integrity in economics, political science, psychology, and related disciplines lead by the Berkeley Initiative for Transparency in the Social Sciences (BITSS).
When I first discovered BITSS, it struck me how little I knew about research on research in the social sciences, the pervasiveness of fraud in science in general (from data cleaning and specification searching to faking data altogether), and the basic lack of consensus on what is the right and wrong way to do research. These issues are essential, yet too often they are left by the wayside. Transparency, reproducibility, replicability, and integrity are the building blocks of scientific research.
Wanting to dig deeper, I attended Prof. Edward Miguel’s class on Transparency in Research. This was one of the first in the social sciences. The course is mostly devoted to development topics, but it has drawn economists, epidemiologists, political scientists, psychologists, and others. When I joined the class, I was finishing writing my thesis, and I envied the students starting their PhD’s having enrolled in such a course. If they can implement what they learned, they would “get to the right answer, not get an answer”. This is the fundamental goal of research, which has gone wrong in many ways in the last years.
I once heard that it is more time consuming and harder to write a paper with bad data, and have to endlessly go back and fix things, than to collect it right the first time. It may seem obvious, but many researchers end up thinking too quickly about their survey design, collect data and then realize either that they asked (many) irrelevant questions or lack statistical power, etc. As long as budgeting is a common concern and departments do not have millions of dollars to collect data or conduct experiments, budgeting concerns should be properly taken into account.
Being transparent starts with elaborating a research plan. From there, it does not require much effort to write it and register it as a pre-analysis plan (see this BITSS post or the World Bank check list). Using a similar logic, a professor once told me that I could have my own hand-written note at the exam – and could literally copy the book – because when someone writes something down, that process forces them to stop and think clearly about what she is writing. Most of us cannot write something down that we do not understand, or does not make sense. I think this also applies to research. You will not register a plan that is doomed to fail.
For a lot of questions, you do not necessarily need to collect your own data. We tend to forget that there are many datasets in the public domain or that many authors that are willing to share. These datasets are being more and more geo-referenced and can be easily merged with other publicly available data. For such datasets, it is harder to build credibility from a pre-analysis plan, but nothing prevents you from doing it for yourself and co-authors, and sticking to it.
Here are some of the tools at our disposal to facilitate the process of making research more transparent and reproducible:
- When you can, register pre-analysis plans (the more flexible platform is the Open Science Framework – which is also great for collaborating on any project). Note that the platform is a great tool for writing any project. They recently added Zotero and Mendeley, which allow sharing references (see Getting Started with the OSF).
- On the tech side: for the ones who still have doubts about diving into R, check out its friendlier user interface R-Studio. It is free, flexible and open-source. Anybody can write functions, so there are packages to do just about anything. Among others, and for this post’s purpose, git, knitr, and sweave allow you to collaborate with peers, replicate their analysis, share codes that directly compile presentations, word documents, or latex documents. New needs on the research side are quickly being met too. Coursera online courses offer a great and free data science module (which includes how to use the above-mentioned tools).
- Even before publication (when peer-review is a long process), you can already share some codes on Github (even without data, or with simulated data). Sharing is more important than you think. Share what you learn, how you learned it, and what you have accomplished if you think it can help others. Not only codes, it could be anything you had a hard time finding, like different approaches to a problem, data collection tips, etc. We learn when we fail. Unfortunately, errors and solutions are still rarely shared.
- Track and document all the steps of your research. It will be beneficial for both you and your peers. For you, it can be really painful to revise a paper submitted months ago if you are not able to quickly understand what you had done. It additionally gives you credibility once your paper, strategy and data are released if others are able to replicate your work. I have used many great codes, and learned a lot from them. Try to find great papers in your field on Harvard Dataverse (and share yours once published).
It is our job to share methodologically sound practices that help answer important questions. It is our job to be open to discussion, debate, and disagreement. You do not want to get ahead by cheating. The satisfaction you will get will be much greater if you do things right (which by now should be an objective concept). Do not let practicality get in the way of rigor.
About the Author: Olivia D’Aoust recently obtained her Ph.D. in Economics from the European Center for Advanced Research in Economics and Statistics (ECARES) at the Solvay Brussels School of Economics and Management, Université libre de Bruxelles. In 2014-2015, she was a Fulbright visiting student researcher at the University of California, Berkeley. She holds Master degrees in economics and in demography from Université catholique de Louvain (2008 and 2012). During her PhD, she studied post-war economics, drawing from micro-level evidence from the African great lakes region. She is particularly interested in Development Economics, Civil War, Urbanization, Public Health, Demography and Applied Microeconometrics