Garret Christensen–BITSS Project Scientist
I’ve recently had the opportunity to represent BITSS at a few interesting meetings and conferences that you might be interested to hear about.
- A group of political scientists and other social scientists met at Stanford and held a daylong workshop to discuss steps the discipline could take to improve research reproducibility, especially in light of the LaCour Science scandal from last year. We’re internally passing around a document summarizing it now, so eventually more will be public, but for now a few concrete ideas that I liked were:
- Including ethics in core graduate education–it’s a part of law and business programs, why do we assume social scientists don’t need similar training?
- Graduate advisers routinely requiring one-click reproducible workflows from the students they advise. If students submitted version controlled Sweave or R Markdown/knitr files, advisers could more easily monitor/help their advisees with the details of research, and it would greatly facilitate reproducibility.
- Journals should consider requiring sharing data and code with reviewers at the time of submission, as opposed to at the time of publication, as some are currently doing. This would enable reviewers to check reproducibility (a very few journals have staff to check the reproducibility of submitted materials, which is also good). However, this could lead to scooping so maybe a non-disclosure agreement would be required.
- I teamed up with Nicole Janz of Cambridge to lead a session on replication at Mozilla’s MozFest in London. I put together an OSF project and GitHub repository for the materials, but basically the point was to search and find the data for a couple papers, and then use R to try and replicate the results. It actually took quite a lot of searching for Nicole and I to find papers that were written in R, as most economists use Stata, but we wanted something that the public could do, so Stata wouldn’t work at an event like this. I was able to find only two:
- Jensenius, Francesca Refsum “Development from Representation? A Study of Quotas for the Scheduled Castes in India” American Economic Journal: Applied Economics 2015, 7(3): 196–220
- Avery, Robert B. and Kenneth P. Brevoort. “The Subprime Crisis: Is Government Housing Policy to Blame?” The Review of Economics and Statistics 2015:97(2):352-363.
But the statistics in the former are a little difficult, so we went with the latter. MozFest attracts mostly programmers, so it was interesting to see what their reaction was. From my perspective, Avery and Brevoort’s paper replicates quite easily, if not 100% perfectly. The data and code are on Dataverse (I’d link to it, but the servers are down right now), and all the major results tables replicate exactly. As far as I can tell, you have to change the data loading command because the data that’s shared is in tabular format and not R format, the version of summary statistics I get are significantly different from the published version of Table 1, and the graphs in the major figure have very minor differences out at the right end of the graph. I don’t know if any of that is because of differences in versions of R or its packages (I did have to update R to get the code to work, something to do with some basic R function changing recently.) To the software developers, the fact that there is no makefile was enough to throw up their hands in disgust. (Do any social science papers share a makefile? I barely know what one is, and definitely could not write one without help.)
Anyhow, it was an interesting experience to talk about replication with people who are way better programmers than I am. By the way, if you’re looking for social science papers in R, the place to go is probably Yale’s Institution for Social and Policy Studies, because they actually re-write Stata code for articles into R. Probably a great way to learn R, too.
- I attended the recent METRICS conference at Stanford. We heard from panels on data transparency, peer review/evaluation, research policy and incentives, methods, and education/engagement. Then we broke into groups to discuss specific policy ideas, and then voted as a group on which of these ideas we liked. There were good suggestions about changing policies at journals, departments, or the peer review process, but there was also a strong call to do a randomized trial with any new policies, so we actually know if they work. (Aside: this seems similar in nature to the iCOMPARE study on optimal shift length for medical residents. If it’s possible to ethically randomize it, sure, let’s do it!) Some specific ideas I liked the most were:
- The NIH/NSF ought to use the sharing of data used in previous grants in evaluating current proposals (or just hold back money until data is posted). Everybody’s legally required to share this data already, but there’s no enforcement. Just use money for an incentive. Seems obvious to me, but I’m an economist.
- The FDA ought to create a restricted access data enclave, like the Census Bureau and CDC already have. Again, seems obvious to me.