The BITSS Resource Library contains resources for learning, teaching, and practicing research transparency and reproducibility, including curricula, slide decks, books, guidelines, templates, software, and other tools. All resources are categorized by i) topic, ii) type, and iii) discipline. Filter results by applying criteria along these parameters or use the search bar to find what you’re looking for.
Know of a great resource that we haven’t included or have questions about the existing resources? Email us!
SPARC (Scholarly Publishing and Academic Resources Coalition) Data Management and De-identificationTransparent Reporting
Royal Society Open Science Registered Reports Health SciencesOther Social SciencesPre-Analysis PlansPsychologyReplicationsResults-Blind Review & Registered Reports
The Royal Society Open Science is a fast, open journal publishing high quality research across all of science, engineering and mathematics. A Registered Report (RR) is a form of journal article in which methods and proposed analyses are pre-registered and peer-reviewed prior to research being conducted (stage 1). High quality protocols are then provisionally accepted for publication before data collection commences. The format is open to attempts of replication as well as novel studies. Once the study is completed, the author will finish the article including results and discussion sections (stage 2). This will be appraised by the reviewers, and provided necessary conditions are met, will be published.
Accountable Replications Policy “Pottery Barn” Dynamic Documents and Coding PracticesOpen PublishingPsychologyReplications
The Accountable Replication Policy commits the Psychology and Cognitive Neuroscience section of Royal Society Open Science to publishing replications of studies previously published within the journal. Authors can either submit a replication study that is already completed or a proposal to replicate a previous study. To ensure that the review process is unbiased by the results, submissions will be reviewed with existing results initially redacted (where applicable), or in the case of study proposals, before the results exist. Submissions that report close, clear and valid replications of the original methodology will be offered in principle acceptance, which virtually guarantees publication of the replication regardless of the study outcome.
Metametrik Data Repositories
Metametrik is a prototype of a platform for storing and search of econometric results, a project lead by the Open Economics Group of the Open Knowledge Foundation. This prototype is an example of a platform where regression results are stored through input in a spreadsheet by an informed researcher, who enters the results on the level of a single regression. The platform then enables search with the option of several facets, including dependent variable, independent variable, model, controls, journal, year, authors, JEL codes and key words.
statcheck Wep App InterdisciplinaryMetascience (Methods and Archival Science)PsychologyReplicationsTransparent Reporting
statcheck is a program that checks for errors in statistical reporting in APA-formatted documents. It was originally written in the R programming language. statcheck/web is a web-based implementation of statcheck. Using statcheck/web, you can check any PDF for statistical errors without installing the R programming language on your computer.
Gates Open Research Health SciencesOpen PublishingOther Social SciencesPreprints
Gates Open Research is a scholarly publishing platform that makes research funded by the Bill & Melinda Gates Foundation available quickly and in a format supporting research integrity, reproducibility and transparency. Its open access model enables immediate publication followed by open, invited peer review, combined with an open data policy.
Impact Evaluation in Practice Data Management and De-identificationHealth SciencesInterdisciplinaryPower analysisPublic Policy
The second edition of the Impact Evaluation in Practice handbook is a comprehensive and accessible introduction to impact evaluation for policymakers and development practitioners. First published in 2011, it has been used widely across the development and academic communities. The book incorporates real-world examples to present practical guidelines for designing and implementing impact evaluations. Readers will gain an understanding of impact evaluation and the best ways to use impact evaluations to design evidence-based policies and programs. The updated version covers the newest techniques for evaluating programs and includes state-of-the-art implementation advice, as well as an expanded set of examples and case studies that draw on recent development challenges. It also includes new material on research ethics and partnerships to conduct impact evaluation.
J-PAL Hypothesis Registry Pre-Analysis Plans
The Abdul Latif Jameel Poverty Action Lab (J-PAL) hypothesis registry accepted submissions from 2009 to 2013, and was then replaced by the AEA’s registry, socialscienceregistry.org. The hypothesis registry contains 13 examples of pre-analysis plans, primarily from economists doing randomized controlled trials in developing country settings, but also from a large-scale policy natural experiment using the Medicaid program in Oregon. Additional pre-analysis plans from the Oregon experiment are available here.
Improving Your Statistical Inference Dynamic Documents and Coding PracticesIssues with transparency and reproducibilityPower analysisPsychologyStatistical Literacy
This course aims to help you to draw better statistical inferences from empirical research. Students discuss how to correctly interpret p-values, effect sizes, confidence intervals, Bayes Factors, and likelihood ratios, and how these statistics answer different questions you might be interested in. Then, they learn how to design experiments where the false positive rate is controlled, and how to decide upon the sample size for a study, for example in order to achieve high statistical power. Subsequently, students learn how to interpret evidence in the scientific literature given widespread publication bias, for example by learning about p-curve analysis. Finally, the course discusses how to do philosophy of science, theory construction, and cumulative science, including how to perform replication studies, why and how to pre-register an experiment, and how to share results following Open Science principles.
Nicebread Data Management and De-identificationData VisualizationDynamic Documents and Coding PracticesInterdisciplinaryIssues with transparency and reproducibilityMeta-AnalysesOpen PublishingPower analysisPre-Analysis PlansPreprintsPsychologyRegistriesReplicationsResults-Blind Review & Registered ReportsTransparent ReportingVersion Control
Dr. Felix Schönbrodt’s blog promoting research transparency and open science.
TextThresher is a mass collaboration software allowing researchers to direct hundreds of volunteers – working through the internet – to label tens of thousands of text documents according to all the concepts vital to researchers’ theories and questions. With TextThresher, projects that would have required a decade of effort, and the close training of wave after wave of research assistants, can be completed in about a year and a half online. The project will likely begin beta-testing the software in late 2017, with plans to release it to the general public in early 2018.
Jupyter Notebooks Data VisualizationInterdisciplinaryReplicationsStatistics and Data ScienceVersion Control
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
Docker Data VisualizationInterdisciplinaryReplicationsVersion Control
Docker is the world’s leading software container platform. Developers use Docker to eliminate “works on my machine” problems when collaborating on code with co-workers. Operators use Docker to run and manage apps side-by-side in isolated containers to get better compute density. Enterprises use Docker to build agile software delivery pipelines to ship new features faster, more securely and with confidence for both Linux and Windows Server apps.
DeclareDesign Dynamic Documents and Coding PracticesInterdisciplinaryPolitical SciencePower analysisPre-Analysis PlansStatistics and Data Science
DeclareDesign is statistical software to aid researchers in characterizing and diagnosing research designs — including experiments, quasi-experiments, and observational studies. DeclareDesign consists of a core package, as well as three companion packages that stand on their own but can also be used to complement the core package: randomizr: Easy-to-use tools for common forms of random assignment and sampling; fabricatr: Tools for fabricating data to enable frontloading analysis decisions in social science research; estimatr: Fast estimators for social science research.
NeuroChambers Issues with transparency and reproducibilityOpen PublishingPower analysisPre-Analysis PlansPsychologyReplicationsResults-Blind Review & Registered ReportsTransparent Reporting
Chris Chambers is a psychologist and neuroscientist at the School of Psychology, Cardiff University. He created this blog after taking part in a debate about science journalism at the Royal Institution in March 2012. The aim of his blog is give you some insights from the trenches of science. He talks about a range of science-related issues and may even give up a trade secret or two.
The New Statistics (+OSF Learning Page) Data Management and De-identificationDynamic Documents and Coding PracticesInterdisciplinaryMeta-AnalysesOpen PublishingPower analysisPre-Analysis PlansPsychologyReplicationsStatistical LiteracyStatistics and Data ScienceTransparent ReportingVersion Control
This OSF project helps organize resources for teaching the “New Statistics” — an approach that emphasizes asking quantitative questions, focusing on effect sizes, using confidence intervals to express uncertainty about effect sizes, using modern data visualizations, seeking replication, and using meta-analysis as a matter of course.
Datavyu Data Management and De-identificationData VisualizationPsychology
An Open Data Library for Developmental Science that allows users to decide how they want to code video, audio, physiology, motion, or eye tracking data. Power users can write scripts in the Ruby programming language to extend Datavyu’s functionality.
Databrary Data Management and De-identificationData VisualizationDynamic Documents and Coding PracticesPsychology
Databrary is a video data library for developmental science. Anyone collecting shareable research data will be able to store and organize their data within Databrary after completing the registration process.
JASP Dynamic Documents and Coding PracticesMeta-AnalysesStatistical LiteracyStatistics and Data ScienceVersion Control
JASP is a cross-platform software program with a state-of-the-art graphical user interface. The JASP interface allows you to conduct statistical analyses in seconds, and without having to learn programming or risking a programming mistake. JASP is statistically inclusive as it offers both frequentist and Bayesian analysis methods. Open source and free of charge.
p-uniform InterdisciplinaryMeta-AnalysesMetascience (Methods and Archival Science)
The p-uniform package provides meta-analysis methods that correct for publication bias. Three methods are currently included in the package. The p-uniform method can be used for estimating effect size, testing the null hypothesis of no effect, and testing for publication bias. The second method in the package is the hybrid method. The hybrid method is a meta-analysis method for combining an original study and replication and while taking into account statistical significance of the original study. The p-uniform and hybrid method are based on the statistical theory that the distribution of p-values is uniform conditional on the population effect size. The third method in the package is the Snapshot Bayesian Hybrid Meta-Analysis Method. This method computes posterior probabilities for four true effect sizes (no, small, medium, and large) based on an original study and replication while taking into account publication bias in the original study. The method can also be used for computing the required sample size of the replication akin to power analysis in null hypothesis significance testing.
p-curve Dynamic Documents and Coding PracticesIssues with transparency and reproducibilityMetascience (Methods and Archival Science)Power analysisStatistics and Data Science
P-curve is a tool for determining if reported effects in literature are true or if they merely reflect selective reporting. P-curve is the distribution of statistically significant p-values for a set of studies (ps < .05). Because only true effects are expected to generate right-skewed p-curves – containing more low (.01s) than high (.04s) significant p-values – only right-skewed p-curves are diagnostic of evidential value. By telling us whether we can rule out selective reporting as the sole explanation for a set of findings, p-curve offers a solution to the age-old inferential problems caused by file-drawers of failed studies and analyses.
DMAS Economics and FinanceInterdisciplinaryMeta-Analyses
The Distributed Meta-Analysis System is an online tool to help scientists analyze, explore, combine, and communicate results from existing empirical studies. It’s primary purpose it to support meta-analyses, by providing a database for empirically estimated models and methods to integrate their results. The current version supports a range of tools that are useful for analyzing empirical climate impact results, but it’s creators intend to expand its applicability to other fields including social sciences, medicine, ecology, and geophysics.
Metalab Data VisualizationLinguisticsMeta-AnalysesMetascience (Methods and Archival Science)Power analysisPsychology
MetaLab is a research tool for aggregating across studies in the language acquisition literature. Currently, MetaLab contains 887 effect sizes across meta-analyses in 13 domains of language acquisition, based on data from 252 papers collecting 11363 subjects. These studies can be used to obtain better estimates of effect sizes across different domains, methods, and ages. Using our power calculator, researchers can use these estimates to plan appropriate sample sizes for prospective studies. More generally, MetaLab can be used as a theoretical tool for exploring patterns in development across language acquisition domains.
statcheck InterdisciplinaryMetascience (Methods and Archival Science)PsychologyReplicationsTransparent Reporting
statcheck is an R package that checks for errors in statistical reporting in APA-formatted documents. It can help estimate the prevalence of reporting errors and is a tool to check your own work before submitting. The package can be used to automatically extract statistics from articles and recompute p values. It is also available as a wep app.
pcpanel Economics and FinancePower analysisPre-Analysis PlansStatistics and Data Science
This package performs power calculations for randomized experiments that use panel data. Unlike the existing programs “sampsi” and “power”, this package accommodates arbitrary serial correlation. The program “pc_simulate” performs simulation-based power calculations using a pre-existing dataset (stored in memory), and accommodates cross-sectional, multi-wave panel, difference-in-differences, and ANCOVA designs. The program “pc_dd_analytic” performs analytical power calculations for a difference-in-differences experimental design, applying the formula derived in Burlig, Preonas, and Woerman (2017) that is robust to serial correlation. Users may either input parameters to characterize the assumed variance-covariance structure of the outcome variable, or allow the subprogram “pc_dd_covar” to estimate the variance-covariance structure from pre-existing data.
Handbook of the Modern Development Specialist Data Management and De-identificationInternational Development
Created by the Responsible Data Forum, this handbook is offered as a first attempt to understand what responsible data means in the context of international development programming. The authors have taken a broad view of development, opting not to be prescriptive about who the perfect “target audience” for this effort is within the space. This book builds on a number of resources and strategies developed in academia, human rights and advocacy, but aims to focus on international development practitioners. The handbook includes chapters on project design, data management, collection, analysis, sharing, and more.
Dataverse Data RepositoriesInterdisciplinary
Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others’ work more easily. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility.
OSF Data Management and De-identificationInterdisciplinaryRegistriesVersion Control
Open Science Framework (OSF) is part version control system, part data repository, part collaboration software that allows researchers to move study materials to the cloud, share and find materials, detail individual contributions, make research design more visible, and register materials to certify research design was not modified to alter outcomes. To increase workflow flexibility OSF offers a system where researchers can register a description of their study and its goals. The OSF emphasizes versatility with a very wide range of tools and features including add-ons from other related sites such as Dataverse and Github. Uploaded materials can also be archived and receive a Digital Object Identifier (DOI) or Archival Resource Key (ARK).
Dryad Data Management and De-identificationEngineering and Computer ScienceHealth SciencesInterdisciplinaryLife SciencesMetascience (Methods and Archival Science)Other Social SciencesStatistics and Data Science
Dryad is a curated repository of data underlying peer-reviewed scientific and medical literature, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Its notable features include easy integration into the manuscript submission workflow of its partner journals, the flexibility to make data privately available during peer review, and allowing submitters to set limited-term embargoes post-publication.
ICPSR Data RepositoriesOther Social SciencesPolitical Science
The Inter-university Consortium for Political and Social Research (ICPSR) maintains and provides access to a vast archive of social science data for research and instruction (over 10,000 discrete studies and surveys with more than 65,000 datasets). ICPSR has been archiving data since 1962.
Qualitative Data Repository Data Management and De-identificationInterdisciplinaryPolitical Science
QDR selects, ingests, curates, archives, manages, durably preserves, and provides access to digital data used in qualitative and multi-method social inquiry. The repository develops and publicizes common standards and methodologically informed practices for these activities, as well as for the reusing and citing of qualitative data. Four beliefs underpin the repository’s mission: data that can be shared and reused should be; evidence-based claims should be made transparently; teaching is enriched by the use of well-documented data; and rigorous social science requires common understandings of its research methods.
re3data.org Data RepositoriesInterdisciplinaryReplications
The Registry of Research Data Repositories (re3data.org) is a global registry of research data repositories that covers research data repositories from different academic disciplines. It presents repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions. re3data.org promotes a culture of sharing, increased access and better visibility of research data. The registry went live in autumn 2012 and is funded by the German Research Foundation (DFG).
Scan.R Data Management and De-identificationInterdisciplinary
Scan.R searches all Stata (.dta), SAS (.sas7bdat), and comma-separated values (.csv) files found in the specified directory for variables that may contain personally identifiable information (PII) using strings that commonly appear as part of variable names or labels that contain PII. (Note: Scan.R does not search labels in .csv files.) Results are displayed to the screen and saved to a comma-separated values file in the current working directory containing the variables and data flagged as potential PII.
Mendeley Data Data RepositoriesData Visualization
Mendeley Data is a multidisciplinary, free-to-use open research data repository, where you can upload and share data files up to 10GB so they are archived, preserved and findable for the long-term. To ensure that research data stands the test of time, each version of a dataset is given a unique DOI, and permanently archived with DANS (Data archiving and Networking Services), ensuring that every dataset and citation will be valid in perpetuity.
Transparent and Open Social Science Research Dynamic Documents and Coding PracticesIssues with transparency and reproducibilityMeta-AnalysesPre-Analysis PlansRegistriesReplicationsStatistical LiteracyTransparent Reporting
Demand is growing for evidence-based policymaking, but there is also growing recognition in the social science community that limited transparency and openness in research have contributed to widespread problems. With this course created by BITSS, you can explore the causes of limited transparency in social science research, as well as tools to make your own work more open and reproducible.
You can access the course videos for self-paced learning on the BITSS YouTube channel here, (also available with subtitles in French here). You can also enroll for free during curated course runs on the FutureLearn platform.
Manual of Best Practices Dynamic Documents and Coding PracticesIssues with transparency and reproducibilityPre-Analysis PlansTransparent Reporting
Manual of Best Practices, written by Garret Christensen (BITSS), is a working guide to the latest best practices for transparent quantitative social science research. The manual is also available, and occasionally updated on GitHub. For suggestions or feedback, contact firstname.lastname@example.org.
Curate Science Issues with transparency and reproducibilityMetascience (Methods and Archival Science)PsychologyReplicationsSociology
Curate Science is a crowd-sourced platform to track, organize, and interpret replications of published findings in the social sciences. Curated replication study characteristics include links to PDFs, open/public data, open/public materials, pre-registered protocols, independent variables (IVs), outcome variables (DVs), replication type, replication design differences, and links to associated evidence collections that feature meta-analytic forest plots.
Open Science Training Initiative Data Management and De-identificationInterdisciplinaryVersion Control
Open Science Training Initiative (OSTI), provides a series of lectures in open science, data management, licensing and reproducibility, for use with graduate students and postdoctoral researchers. The lectures can be used individually as one-off information lectures in aspects of open science, or can be integrated into existing course curriculum. Content, slides and advice sheets for the lectures and other training materials are being gradually released on the GitHub repository as the official release versions become available.
Swirl Data VisualizationInterdisciplinary
Data Science Certificate Data VisualizationEngineering and Computer ScienceInterdisciplinaryStatistical LiteracyStatistics and Data Science
Data Science Certificate offered on Coursera, is set of nine classes that cover the concepts and tools needed to analyze data starting with asking the right kinds of questions to making inferences and publishing results.
Reproducible Research Data Management and De-identificationInterdisciplinaryStatistical LiteracyStatistics and Data Science
Reproducible Research taught by Roger D. Peng, Jeff Leek, and Brian Caffoof of Johns Hopkins University is a course on Coursera that teaches methods to organize data analysis so that it is reproducible and accessible to others. In this course students will learn to write a document using R markdown, integrate live R code into a literate statistical program and compile R markdown documents using knitr and related tools.
Implementing Reproducible Research Dynamic Documents and Coding PracticesStatistics and Data ScienceTransparent ReportingVersion Control
Implementing Reproducible Research by Victoria Stodden, Friedrich Leisch, and Roger D. Peng covers many of the elements necessary for conducting and distributing reproducible research. The book focuses on the tools, practices, and dissemination platforms for ensuring reproducibility in computational science.
The Workflow of Data Analysis Using Stata Data Management and De-identificationInterdisciplinaryStatistical LiteracyStatistics and Data Science
Stata by J. Scott Long, explains how to manage aspects of data analysis including cleaning data; creating, renaming, and verifying variables; performing and presenting statistical analyses and producing replicable results.
Impact Evaluation Replication Programme Economics and FinancePolitical SciencePublic PolicyReplications
International Initiative for Impact Evaluation (3ie) Replication Grant funds replications. Funding requests are reviewed on a rolling basis. High quality applicants are invited to submit full proposals.
Edawax conducts meta-research on a variety of topics related to research practices – including an analysis of the data sharing policies of peer-reviewed journals – with the hope of 1) gaining the insight to identify the obstacles to performing replications and 2) using those insights to develop resources and infrastructure to facilitate replications and meta-analysis.
EGAP Registry Economics and FinancePolitical SciencePre-Analysis PlansPublic PolicyRegistriesSociology
The Evidence in Governance and Politics (EGAP) Registry focuses on designs for experiments and observational studies in governance and politics. The registry allows users to submit an array of information via an online form. Registered studies can be viewed in the form of a pdf on the EGAP site. The EGAP registry is straightforward and emphasizes simplicity for registering impact evaluations.
ClinicalTrials.gov Health SciencesPre-Analysis PlansRegistries
ClinicalTrials.gov is a registry and database that provides information on publicly and privately funded clinical trials, maintained by the National Library of Medicine at the National Institutes of Health. Studies are often submitted to the site when they begin and are regularly updated along the way. ClinicalTrials.gov is the largest trial registry, with over 250,000 studies from across the world.