Teaching Integrity in Empirical Research

Richard Ball (Economics Professor at Haverford College and presenter at the 2014 BITSS Summer Institute) and Norm Medeiros (Associate Librarian at Haverford College) in a recent interview appearing on  the Library of Congress based blog The Signal, discussed Project TIER (Teaching Integrity in Empirical Research) and their experience educating students how to document their empirical analysis.  

What is Project TIER

For close to a decade, we have been teaching our students how to assemble comprehensive documentation of the data management and analysis they do in the course of writing an original empirical research paper. Project TIER is an effort to reach out to instructors of undergraduate and graduate statistical methods classes in all the social sciences to share with them lessons we have learned from this experience.

What is the TIER documentation protocol?

We gradually developed detailed instructions describing all the components that should be included in the documentation and how they should be formatted and organized. We now refer to these instructions as the TIER documentation protocol. The protocol specifies a set of electronic files (including data, computer code and supporting information) that would be sufficient to allow an independent researcher to reproduce–easily and exactly–all the statistical results reported in the paper.

What are the benefits for the students who follow this protocol?

When students know from the outset that they will be required to turn in documentation showing how they arrive at the results they report in their papers, they approach their projects in a much more organized way and keep much better track of their work at every phase of the research. Their understanding of what they are doing is therefore substantially enhanced, and I in turn am able to offer much more effective guidance when they come to me for help.

What do students think of TIER?

There are always a few wrinkles to work out, and sometimes there is a bit of grumbling, but as soon as students start working seriously with their data they see how useful it was to do that up-front preparation. They realize quickly that organizing their work as prescribed by the protocol increases their efficiency dramatically, and by the end of the semester they are totally sold–they can’t imagine doing it any other way.

What other parallel efforts do you see arising?

In Sociology, Scott Long of Indiana University is a leader in the development of best practices in responsible data management and documentation. The Center for Open Science, led by psychologists Brian Nosek and Jeffrey Spies of the University of Virginia, is developing a web-based platform to facilitate pre-registration of experiments as well as replication studies […] The Inter-university Consortium for Political and Social Research (ICPSR), which for over 50 year has served as a preeminent archive for social science research data, is also making important contributions to responsible data stewardship and research credibility.

These bottom-up efforts also align well with several federal initiatives. Beginning in 2011, the NSF requires all proposals to include a “data management plan” outlining procedures that will be followed to support the dissemination and sharing of research results. Similarly, the NIH requires all investigator-initiated applications with direct costs greater than $500,000 in any single year to address data sharing in the application. More recently, in 2013 the White House Office on Science and Technology Policy issued a policy memorandum titled “Increasing Access to the Results of Federally Funded Scientific Research,” directing all federal agencies with more than $100 million in research and development expenditures to establish guidelines for the sharing of data from federally funded research.

Have you incorporated these or other tools into your pedagogy?

In fall 2013, we experimented with using Dataverse directly with students […] Our Project TIER Dataverse is available online […] This fall we plan to use the Open Science Framework system to see if it can serve our students slightly better.

Although the protocol was first developed to use with the Stata statistical package, a software-neutral version has been released and versions adapted for R and SPSS statistical packages are already under development with plans for a SAS version still in the pipeline. The protocol is not compatible with Excel. Instructions for the TIER protocol are available online. The full interview can be found here.