The BITSS Resource Library contains resources for learning, teaching, and practicing research transparency and reproducibility, including curricula, slide decks, books, guidelines, templates, software, and other tools. All resources are categorized by i) topic, ii) type, and iii) discipline. Filter results by applying criteria along these parameters or use the search bar to find what you’re looking for.
Know of a great resource that we haven’t included or have questions about the existing resources? Email us!
Reproducible Data Science with Python Data VisualizationInterdisciplinaryReproducibilityStatistics and Data ScienceVersion Control
Written by Valentin Danchev, “Reproducible Data Science with Python” is a textbook that uses real-world social data sets related to the COVID-19 pandemic to provide an accessible introduction to open, reproducible, and ethical data analysis using hands-on Python coding, modern open-source computational tools, and data science techniques. Topics include open reproducible research workflows, data wrangling, exploratory data analysis, data visualization, pattern discovery (e.g., clustering), prediction & machine learning, causal inference, and network analysis.
Dataverse: Research Transparency through Data Sharing Data RepositoriesReproducibilityTransparency
Find slides from a presentation by Mercè Crosas titled “Dataverse: Research Transparency through Data Sharing”.
Software Carpentry Data Management and De-identificationDynamic Documents and Coding PracticesEngineering and Computer ScienceInterdisciplinaryStatistics and Data ScienceVersion Control
Software Carpentry offers online tutorials for data analysis including Version Control with Git, Using Databases and SQL, Programming with Python, Programming with R and Programming with MATLAB.
Data Carpentry Lessons Data Management and De-identificationInterdisciplinary
Developed by Data Carpentry, these lessons can be used across the social sciences to teach data cleaning, management, analysis, and visualization. R is the base language for instruction, and there are no pre-requisites in terms of prior knowledge about this topic.
Conda Data VisualizationInterdisciplinaryStatistics and Data Science
Whole Tale Data Management and De-identificationData VisualizationInterdisciplinaryReplicationsStatistics and Data ScienceVersion Control
Whole Tale is an infrastructure that allows users to share data, methods and analysis protocols, and final research outputs in a single, executable object (“living publication” or “tale”) alongside any research publication. Learn more here.
Course Syllabi for Open and Reproducible Methods Anthropology, Archaeology, and EthnographyData RepositoriesData VisualizationDynamic Documents and Coding PracticesEconomics and FinanceEngineering and Computer ScienceHealth SciencesHumanitiesInterdisciplinaryIssues with transparency and reproducibilityLife SciencesLinguisticsMeta-AnalysesMetascience (Methods and Archival Science)Open PublishingOther Social SciencesPolitical SciencePower analysisPre-Analysis PlansPsychologyPublic PolicyRegistriesReplicationsSociologyStatistical LiteracyStatistics and Data ScienceTransparent ReportingVersion Control
A collection of course syllabi from any discipline featuring content to examine or improve open and reproducible research practices. Housed on the OSF.
rOpenSci Packages Data Management and De-identificationDynamic Documents and Coding PracticesInterdisciplinaryMeta-AnalysesMetascience (Methods and Archival Science)Power analysisReplicationsStatistics and Data ScienceVersion Control
These packages are carefully vetted, staff- and community-contributed R software tools that lower barriers to working with scientific data sources and data that support research applications on the web.
Nicebread Data Management and De-identificationData VisualizationDynamic Documents and Coding PracticesInterdisciplinaryIssues with transparency and reproducibilityMeta-AnalysesOpen PublishingPower analysisPre-Analysis PlansPreprintsPsychologyRegistriesReplicationsResults-Blind Review & Registered ReportsTransparent ReportingVersion Control
Dr. Felix Schönbrodt’s blog promoting research transparency and open science.
Jupyter Notebooks Data VisualizationInterdisciplinaryReplicationsStatistics and Data ScienceVersion Control
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
Docker Data VisualizationInterdisciplinaryReplicationsVersion Control
Docker is the world’s leading software container platform. Developers use Docker to eliminate “works on my machine” problems when collaborating on code with co-workers. Operators use Docker to run and manage apps side-by-side in isolated containers to get better compute density. Enterprises use Docker to build agile software delivery pipelines to ship new features faster, more securely and with confidence for both Linux and Windows Server apps.
The New Statistics (+OSF Learning Page) Data Management and De-identificationDynamic Documents and Coding PracticesInterdisciplinaryMeta-AnalysesOpen PublishingPower analysisPre-Analysis PlansPsychologyReplicationsStatistical LiteracyStatistics and Data ScienceTransparent ReportingVersion Control
This OSF project helps organize resources for teaching the “New Statistics” — an approach that emphasizes asking quantitative questions, focusing on effect sizes, using confidence intervals to express uncertainty about effect sizes, using modern data visualizations, seeking replication, and using meta-analysis as a matter of course.
Datavyu Data Management and De-identificationData VisualizationPsychology
An Open Data Library for Developmental Science that allows users to decide how they want to code video, audio, physiology, motion, or eye tracking data. Power users can write scripts in the Ruby programming language to extend Datavyu’s functionality.
Databrary Data Management and De-identificationData VisualizationDynamic Documents and Coding PracticesPsychology
Databrary is a video data library for developmental science. Anyone collecting shareable research data will be able to store and organize their data within Databrary after completing the registration process.
Metalab Data VisualizationLinguisticsMeta-AnalysesMetascience (Methods and Archival Science)Power analysisPsychology
MetaLab is a research tool for aggregating across studies in the language acquisition literature. Currently, MetaLab contains 887 effect sizes across meta-analyses in 13 domains of language acquisition, based on data from 252 papers collecting 11363 subjects. These studies can be used to obtain better estimates of effect sizes across different domains, methods, and ages. Using our power calculator, researchers can use these estimates to plan appropriate sample sizes for prospective studies. More generally, MetaLab can be used as a theoretical tool for exploring patterns in development across language acquisition domains.
Mendeley Data Data RepositoriesData Visualization
Mendeley Data is a multidisciplinary, free-to-use open research data repository, where you can upload and share data files up to 10GB so they are archived, preserved and findable for the long-term. To ensure that research data stands the test of time, each version of a dataset is given a unique DOI, and permanently archived with DANS (Data archiving and Networking Services), ensuring that every dataset and citation will be valid in perpetuity.
Swirl Data VisualizationInterdisciplinary
Data Science Certificate Data VisualizationEngineering and Computer ScienceInterdisciplinaryStatistical LiteracyStatistics and Data Science
Data Science Certificate offered on Coursera, is set of nine classes that cover the concepts and tools needed to analyze data starting with asking the right kinds of questions to making inferences and publishing results.