Resource Library

The BITSS Resource Library contains resources for learning, teaching, and practicing research transparency and reproducibility, including curricula, slide decks, books, guidelines, templates, software, and other tools. All resources are categorized by i) topic, ii) type, and iii) discipline. Filter results by applying criteria along these parameters or use the search bar to find what you’re looking for.

Know of a great resource that we haven’t included or have questions about the existing resources? Email us!

Disseminate

Design

Collect & Analyze Data

47 Results

↗

PGRP Onboarding Materials for Collaborative Reproducible Workflows Data Management+EconomicsInterdisciplinaryPolitical ScienceReproducibilityVersion Control

Catalyst Thomas Brailey developed a set of training materials to help transition J-PAL’s Payments and Governance Research Program (PGRP) towards a version-controlled research pipeline by onboarding all research team members to GitHub, GitHub desktop, and R. These teaching materials can be applied to onboard other research/lab teams across a variety of contexts in social science research.

↗

Researсh Ethics: History & Principles Data ManagementInterdisciplinary

Written by Lily Mayer, this infographic is a helpful introduction to the history and basic principles of research ethics.

↗

A template README for social science replication packages Data Management+EconomicsInterdisciplinaryOther Social SciencesPolitical SciencePsychologyPublic HealthPublic PolicyReproducibility

The template README follows best practices as defined by a number of data editors at social science journals. A full list of endorsers is listed in Endorsers. The most recent version is available at https://social-science-data-editors.github.io/template_README/. Specific releases can be found at https://github.com/social-science-data-editors/template_README/releases. The template README is available in a variety of formats, including HTML (best for reading), LaTeX, Word, PDF, and Markdown.

↗

TIER Protocol 4.0 Data Management+InterdisciplinaryReproducibility

The TIER Protocol specifies the contents and organization of reproduction documentation for a project involving computations with statistical data.

↗

Lab Manual for Jade Benjamin-Chung’s Lab Data Management+InterdisciplinaryPublic HealthReproducibility

This is a lab manual for students and staff working with Jade Benjamin-Chung at Stanford University. Its goal is to support collaborative, transparent, and reproducible workflows and it contains guidance on tools and good practices in communications, coding, version control, and data sharing, among others. It also features an internal replication process that increases reproducibility by identifying and resolving errors prior to publication.

↗

ResearchBox Data Management+Interdisciplinary

ResearchBox offers an easy way to share and access scientific content, such as data, code, pre-registrations, and study materials. Uploaded files are organized into “Bingo Tables” that allow readers to easily find & access available files (e.g., researchbox.org/15). Among many features, ResearchBox provides:

One-click downloads
Instantaneous file-previews
Codebooks for every dataset
Integration with AsPredicted.org

↗

Open Research Calendar Data Management+Open PublishingOpen ScienceReproducibilityStatistical Literacy

Open Research Calendar is an open-source community tool that collates information on worldwide events related to open science and research.

↗

Development Research in Practice : The DIME Analytics Data Handbook Data Management+EconomicsEthicsImpact EvaluationInterdisciplinaryInternational DevelopmentPre-Analysis PlansPre-RegistrationStatistical Literacy

“Development Research in Practice” leads the reader through a complete empirical research project, providing links to continuously updated resources on the DIME Wiki as well as illustrative examples from the Demand for Safe Spaces study. The handbook is intended to train users of development data on how to handle data effectively, efficiently, and ethically. See an accompanying online course here.

↗

Framework for Open and Reproducible Research Training (FORRT) Data Management+Dynamic Documents and Coding PracticesInterdisciplinaryPre-Analysis PlansStatistical LiteracyTransparent Reporting

FORRT is a pedagogical infrastructure designed to recognize and support the teaching and mentoring of open and reproducible science tenets in tandem with prototypical subject matters in higher education. FORRT also advocates for the opening of teaching and mentoring materials as a means to facilitate access, discovery, and learning to those who otherwise would be educationally disenfranchised.

↗

Open Science Success Stories Data Management+

The Open Research Funders Group curates the Open Science Success Stories, a database of examples of how openness has benefited researchers and broader society.

↗

Data Citations module Data Management+InterdisciplinaryTransparent Reporting

Created by the Federal Reserve Bank of St. Louis, this module introduces students to the key elements of data citations. See also related modules for Data Literacy.

↗

Handbook on Using Administrative Data for Research and Evidence-Based Policy Data Management+EconomicsInterdisciplinaryInternational DevelopmentReproducibility

Co-edited by Shawn Cole, Iqbal Dhaliwal, Anja Sautmann, and Lars Vilhuber and published by J-PAL’s Innovations in Data and Experiments for Action Initiative (IDEA), this handbook includes case studies of large-scale randomized evaluations using private and national government administrative data, and technical guidance to support partnerships with governments, nonprofits, or firms to access data and pursue cutting-edge, policy-relevant projects.

↗

J-PAL Guide to De-Identifying Data Data Management+International Development

Developed by J-PAL’s Sarah Kooper, Anja Sautmann, and James Turrito, this guide includes:

An overview of personally identifiable information (PII) and the responsibility of data users not to use data to try to identify human subjects
Recommendations for handling direct identifiers (such as full name, social security number, or phone number), as well as indirect identifiers (such as month/year of birth, nationality, or gender)
Guidance on de-identification steps to take throughout the research process, such as encrypting all data containing identifying information as soon as possible
A list of common identifiers, including those labeled by the United States’ Health Insurance Portability and Accountability Act (HIPAA) guidelines as direct identifiers
And more.

See also the accompanying Guide to Publishing Research Data.

↗

J-PAL Guide to Publishing Research Data Data Management+International DevelopmentPublic Policy

Developed by J-PAL’s Sarah Kooper, Anja Sautmann, and James Turrito, this guide includes:

A list of considerations to make before publishing data, such as what information was provided to study participants and the IRB, the sensitivity of the data collected, and legal requirements
Sample consent form language that will allow future publication of de-identified data
A checklist for preparing data for publication
And more.

See also the accompanying Guide to De-identifying Data.

↗

Data Sharing Checklist for NGOs and Practitioners Data Management+Interdisciplinary

This checklist developed by Teamscope can help NGOs and Practitioners understand the common pitfalls in open data, and how open data impacts every step of a project’s pipeline, from proposal writing to dissemination.

↗

Videos: Research Transparency and Reproducibility Training (RT2) – Washington, D.C. Data Management+InterdisciplinaryMeta-AnalysesPower analysisPre-Analysis PlansPreprintsRegistriesReplicationsStatistical LiteracyTransparent ReportingVersion Control

BITSS hosted a Research Transparency and Reproducibility Training (RT2) in Washington DC, September 11-13, 2019. This was the eighth training event of this kind organized by BITSS since 2014.

RT2 provides participants with an overview of tools and best practices for transparent and reproducible social science research. Click here to videos of presentations given during the training. Find slide decks and other useful materials on this OSF project page (https://osf.io/3mxrw/).

↗

Open Data Metrics: Lighting the Fire Data Management+Interdisciplinary

In this book, Daniella Lowenberg and colleagues describe the journey towards open data metrics, prompting community discussion and providing implementation examples along the way. Data metrics are a pre-condition to realize the benefits of open data sharing practices.

↗

Software Carpentry Data Management+Dynamic Documents and Coding PracticesEngineering and Computer ScienceInterdisciplinaryStatistics and Data ScienceVersion Control

Software Carpentry offers online tutorials for data analysis including Version Control with Git, Using Databases and SQL, Programming with Python, Programming with R and Programming with MATLAB.

↗

ResonsibleData.io Data Management+Dynamic Documents and Coding PracticesInterdisciplinaryStatistics and Data Science

Using data for social change work offers many opportunities, but it brings challenges, too. The RD community develops practical ways to deal with the unintended consequences of using data in social change work, establishes best practices, and shares approaches between leading thinkers and doers from different sectors. We discuss thorny topics in-person, facilitate online group discussions on the RD mailing list, and share resources on this site.

↗

Web Plot Digitizer Data Management+InterdisciplinaryStatistics and Data Science

App extracts data from charts

↗

Data Carpentry Lessons Data Management+Interdisciplinary

Developed by Data Carpentry, these lessons can be used across the social sciences to teach data cleaning, management, analysis, and visualization. R is the base language for instruction, and there are no pre-requisites in terms of prior knowledge about this topic.

↗

ARDC FAIR Data self-assessment tool Data Management+Interdisciplinary

This checklist, developed by the Australian Research Data Commons (ARDC) may help researchers make their datasets FAIRer: findable, accessible, interoperable and re-usable. Read More →

↗

Whole Tale Data Management+Data VisualizationInterdisciplinaryReplicationsStatistics and Data ScienceVersion Control

Whole Tale is an infrastructure that allows users to share data, methods and analysis protocols, and final research outputs in a single, executable object (“living publication” or “tale”) alongside any research publication. Learn more here.

↗

NRIN Collection of Resources on Research Integrity Data Management+InterdisciplinaryMeta-AnalysesOpen PublishingRegistriesTransparent Reporting

Curated by the Netherlands Research Integrity Network (NRIN), this collection contains literature, tools, guidelines, and educational media related to research Integrity. Access the Collection here.

↗

PhD Course Materials: Transparent, Open, and Reproducible Policy Research Data Management+Dynamic Documents and Coding PracticesHealth SciencesInterdisciplinaryMeta-AnalysesOpen PublishingPre-Analysis PlansPreprintsPublic PolicyRegistriesReplicationsStatistical LiteracyTransparent ReportingVersion Control

BITSS Catalyst Sean Grant developed and delivered a PhD course on Transparent, Open, and Reproducible Policy Research at the Pardee RAND Graduate School in Policy Analysis. Find all course materials at the project’s OSF page.

↗

Course Syllabi for Open and Reproducible Methods Anthropology, Archaeology, and Ethnography+Data RepositoriesData VisualizationDynamic Documents and Coding PracticesEconomics and FinanceEngineering and Computer ScienceHealth SciencesHumanitiesInterdisciplinaryLife SciencesLinguisticsMeta-AnalysesOpen PublishingOther Social SciencesPolitical SciencePower analysisPre-Analysis PlansPsychologyPublic PolicyRegistriesReplicationsSociologyStatistical LiteracyStatistics and Data ScienceTransparent ReportingVersion Control

A collection of course syllabi from any discipline featuring content to examine or improve open and reproducible research practices. Housed on the OSF.

↗

DMPTool Data Management+Interdisciplinary

The DMPTool is a free service developed by California Digital Library (CDL) and DataONe that helps researchers and institutions to create high-quality data management plans that meet funder requirements.

↗

rOpenSci Packages Data Management+Dynamic Documents and Coding PracticesInterdisciplinaryMeta-AnalysesPower analysisReplicationsStatistics and Data ScienceVersion Control

These packages are carefully vetted, staff- and community-contributed R software tools that lower barriers to working with scientific data sources and data that support research applications on the web.

↗

Improving the Credibility of Social Science Research: A Practical Guide for Researchers Data Management+Economics and FinanceInterdisciplinaryPolitical SciencePre-Analysis PlansPsychologyPublic PolicyRegistriesReplicationsSociology

Created by the Policy Design and Evaluation Lab (PDEL) at UCSD, this teaching module was developed to demonstrate the credibility crisis in the social sciences caused by a variety of incentives and practices at both the disciplinary and individual levels, and provide practical steps for researchers to improve the credibility of their work throughout the lifecycle of a project. It is intended for use in graduate-level social science methodology courses—including those in political science, economics, sociology, and psychology—at UCSD and beyond.

These materials were developed as part of a BITSS Catalyst Training Project “Creating Pedagogical Materials to Enhance Research Transparency at UCSD” led by Catalysts Scott Desposato and Craig McIntosh along with Julia Clark, PhD candidate at UCSD.

↗

SPARC (Scholarly Publishing and Academic Resources Coalition) Data Management+Transparent Reporting

This community resource for tracking, comparing, and understanding both current and future U.S. federal funder research data sharing policies is a joint project of SPARC & Johns Hopkins University Libraries.

↗

Impact Evaluation in Practice Data Management+Health SciencesInterdisciplinaryPower analysisPublic Policy

The second edition of the Impact Evaluation in Practice handbook is a comprehensive and accessible introduction to impact evaluation for policymakers and development practitioners. First published in 2011, it has been used widely across the development and academic communities. The book incorporates real-world examples to present practical guidelines for designing and implementing impact evaluations. Readers will gain an understanding of impact evaluation and the best ways to use impact evaluations to design evidence-based policies and programs. The updated version covers the newest techniques for evaluating programs and includes state-of-the-art implementation advice, as well as an expanded set of examples and case studies that draw on recent development challenges. It also includes new material on research ethics and partnerships to conduct impact evaluation.

↗

Nicebread Data Management+Data VisualizationDynamic Documents and Coding PracticesInterdisciplinaryMeta-AnalysesOpen PublishingPower analysisPre-Analysis PlansPreprintsPsychologyRegistriesReplicationsTransparent ReportingVersion Control

Dr. Felix Schönbrodt’s blog promoting research transparency and open science.

↗

The New Statistics (+OSF Learning Page) Data Management+Dynamic Documents and Coding PracticesInterdisciplinaryMeta-AnalysesOpen PublishingPower analysisPre-Analysis PlansPsychologyReplicationsStatistical LiteracyStatistics and Data ScienceTransparent ReportingVersion Control

This OSF project helps organize resources for teaching the “New Statistics” — an approach that emphasizes asking quantitative questions, focusing on effect sizes, using confidence intervals to express uncertainty about effect sizes, using modern data visualizations, seeking replication, and using meta-analysis as a matter of course.

↗

Datavyu Data Management+Data VisualizationPsychology

An Open Data Library for Developmental Science that allows users to decide how they want to code video, audio, physiology, motion, or eye tracking data. Power users can write scripts in the Ruby programming language to extend Datavyu’s functionality.

↗

Databrary Data Management+Data VisualizationDynamic Documents and Coding PracticesPsychology

Databrary is a video data library for developmental science. Anyone collecting shareable research data will be able to store and organize their data within Databrary after completing the registration process.

↗

rpsychologist Data Management+Dynamic Documents and Coding PracticesInterdisciplinaryOpen PublishingPsychology

Kristoffer Magnusson’s blog about R, Statistics, Psychology, Open Science, and Data Visualization.

↗

Handbook of the Modern Development Specialist Data Management+International Development

Created by the Responsible Data Forum, this handbook is offered as a first attempt to understand what responsible data means in the context of international development programming. The authors have taken a broad view of development, opting not to be prescriptive about who the perfect “target audience” for this effort is within the space. This book builds on a number of resources and strategies developed in academia, human rights and advocacy, but aims to focus on international development practitioners. The handbook includes chapters on project design, data management, collection, analysis, sharing, and more.

↗

Open Science Framework Data Management+InterdisciplinaryRegistriesVersion Control

Open Science Framework (OSF) is part version control system, part data repository, part collaboration software that allows researchers to move study materials to the cloud, share and find materials, detail individual contributions, make research design more visible, and register materials to certify research design was not modified to alter outcomes. To increase workflow flexibility OSF offers a system where researchers can register a description of their study and its goals. The OSF emphasizes versatility with a very wide range of tools and features including add-ons from other related sites such as Dataverse and Github. Uploaded materials can also be archived and receive a Digital Object Identifier (DOI) or Archival Resource Key (ARK).

↗

Dryad Data Management+Engineering and Computer ScienceHealth SciencesInterdisciplinaryLife SciencesOther Social SciencesStatistics and Data Science

Dryad is a curated repository of data underlying peer-reviewed scientific and medical literature, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Its notable features include easy integration into the manuscript submission workflow of its partner journals, the flexibility to make data privately available during peer review, and allowing submitters to set limited-term embargoes post-publication.

↗

Qualitative Data Repository Data Management+InterdisciplinaryPolitical Science

QDR selects, ingests, curates, archives, manages, durably preserves, and provides access to digital data used in qualitative and multi-method social inquiry. The repository develops and publicizes common standards and methodologically informed practices for these activities, as well as for the reusing and citing of qualitative data. Four beliefs underpin the repository’s mission: data that can be shared and reused should be; evidence-based claims should be made transparently; teaching is enriched by the use of well-documented data; and rigorous social science requires common understandings of its research methods.

↗

Scan.R Data ManagementInterdisciplinary

Scan.R searches all Stata (.dta), SAS (.sas7bdat), and comma-separated values (.csv) files found in the specified directory for variables that may contain personally identifiable information (PII) using strings that commonly appear as part of variable names or labels that contain PII. (Note: Scan.R does not search labels in .csv files.) Results are displayed to the screen and saved to a comma-separated values file in the current working directory containing the variables and data flagged as potential PII.

↗

Open Science Training Initiative Data Management+InterdisciplinaryVersion Control

Open Science Training Initiative (OSTI), provides a series of lectures in open science, data management, licensing and reproducibility, for use with graduate students and postdoctoral researchers. The lectures can be used individually as one-off information lectures in aspects of open science, or can be integrated into existing course curriculum. Content, slides and advice sheets for the lectures and other training materials are being gradually released on the GitHub repository as the official release versions become available.

↗

Reproducible Research Data Management+InterdisciplinaryStatistical LiteracyStatistics and Data Science

Reproducible Research taught by Roger D. Peng, Jeff Leek, and Brian Caffoof of Johns Hopkins University is a course on Coursera that teaches methods to organize data analysis so that it is reproducible and accessible to others. In this course students will learn to write a document using R markdown, integrate live R code into a literate statistical program and compile R markdown documents using knitr and related tools.

↗

OpenIntro Statistics Data Management+Dynamic Documents and Coding PracticesInterdisciplinaryStatistical LiteracyStatistics and Data Science

OpenIntro Statistics is a free comprehensive 400 page online textbook and suite of educational material on statistics and data analysis.

↗

The Workflow of Data Analysis Using Stata Data Management+InterdisciplinaryStatistical LiteracyStatistics and Data Science

Stata by J. Scott Long, explains how to manage aspects of data analysis including cleaning data; creating, renaming, and verifying variables; performing and presenting statistical analyses and producing replicable results.

↗

Experimental Lab Standard Operating Procedures Data Management+Meta-AnalysesPolitical SciencePre-Analysis PlansReplicationsTransparent Reporting

This standard operating procedure (SOP) document describes the default practices of the experimental research group led by Donald P. Green at Columbia University. These defaults apply to analytic decisions that have not been made explicit in pre-analysis plans (PAPs). They are not meant to override decisions that are laid out in PAPs. The contents of our lab’s SOP available for public use. We welcome others to copy or adapt it to suit their research purposes.