Reproducible Data Science with Python Data Visualization+

Written by Valentin Danchev, “Reproducible Data Science with Python” is a textbook that uses real-world social data sets related to the COVID-19 pandemic to provide an accessible introduction to open, reproducible, and ethical data analysis using hands-on Python coding, modern open-source computational tools, and data science techniques. Topics include open reproducible research workflows, data wrangling, exploratory data analysis, data visualization, pattern discovery (e.g., clustering), prediction & machine learning, causal inference, and network analysis.

 

Read More →

Software Carpentry Data Management+

Software Carpentry offers online tutorials for data analysis including Version Control with Git, Using Databases and SQL, Programming with Python, Programming with R and Programming with MATLAB.

Web Plot Digitizer Data Management+

App extracts data from charts

Data Carpentry Lessons Data Management+

Developed by Data Carpentry, these lessons can be used across the social sciences to teach data cleaning, management, analysis, and visualization. R is the base language for instruction, and there are no pre-requisites in terms of prior knowledge about this topic.

Conda Data Visualization+

Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda installs, runs and updates packages and their dependencies and is operable in multiple languages, including Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN.

Whole Tale Data Management+

Whole Tale is an infrastructure that allows users to share data, methods and analysis protocols, and final research outputs in a single, executable object (“living publication” or “tale”) alongside any research publication. Learn more here.

Course Syllabi for Open and Reproducible Methods Anthropology, Archaeology, and Ethnography+

A collection of course syllabi from any discipline featuring content to examine or improve open and reproducible research practices. Housed on the OSF.

rOpenSci Packages Data Management+

These packages are carefully vetted, staff- and community-contributed R software tools that lower barriers to working with scientific data sources and data that support research applications on the web.

Nicebread Data Management+

Dr. Felix Schönbrodt’s blog promoting research transparency and open science.

Jupyter Notebooks Data Visualization+

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

Docker Data Visualization+

Docker is the world’s leading software container platform. Developers use Docker to eliminate “works on my machine” problems when collaborating on code with co-workers. Operators use Docker to run and manage apps side-by-side in isolated containers to get better compute density. Enterprises use Docker to build agile software delivery pipelines to ship new features faster, more securely and with confidence for both Linux and Windows Server apps.

Read More →

The New Statistics (+OSF Learning Page) Data Management+

This OSF project helps organize resources for teaching the “New Statistics” — an approach that emphasizes asking quantitative questions, focusing on effect sizes, using confidence intervals to express uncertainty about effect sizes, using modern data visualizations, seeking replication, and using meta-analysis as a matter of course.

 

Datavyu Data Management+

An Open Data Library for Developmental Science that allows users to decide how they want to code video, audio, physiology, motion, or eye tracking data. Power users can write scripts in the Ruby programming language to extend Datavyu’s functionality.

 

Databrary Data Management+

Databrary is a video data library for developmental science. Anyone collecting shareable research data will be able to store and organize their data within Databrary after completing the registration process.

rpsychologist Data Management+

Kristoffer Magnusson’s blog about R, Statistics, Psychology, Open Science, and Data Visualization.

Metalab Data Visualization+

MetaLab is a research tool for aggregating across studies in the language acquisition literature. Currently, MetaLab contains 887 effect sizes across meta-analyses in 13 domains of language acquisition, based on data from 252 papers collecting 11363 subjects. These studies can be used to obtain better estimates of effect sizes across different domains, methods, and ages. Using our power calculator, researchers can use these estimates to plan appropriate sample sizes for prospective studies. More generally, MetaLab can be used as a theoretical tool for exploring patterns in development across language acquisition domains.

Read More →

Figshare Data Repositories+

Figshare allows researchers to publish all of their research outputs in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets.

Mendeley Data Data RepositoriesData Visualization

Mendeley Data is a multidisciplinary, free-to-use open research data repository, where you can upload and share data files up to 10GB so they are archived, preserved and findable for the long-term. To ensure that research data stands the test of time, each version of a dataset is given a unique DOI, and permanently archived with DANS (Data archiving and Networking Services), ensuring that every dataset and citation will be valid in perpetuity.

Read More →

Swirl Data Visualization+

Swirl is a software package for the R programming language that turns the R console into an interactive learning environment. Users receive immediate feedback as they are guided through self-paced lessons in data science and R programming.

Read More →

Data Science Certificate Data Visualization+

Data Science Certificate offered on Coursera, is set of nine classes that cover the concepts and tools needed to analyze data starting with asking the right kinds of questions to making inferences and publishing results.

OpenIntro Statistics Data Management+

OpenIntro Statistics is a free comprehensive 400 page online textbook and suite of educational material on statistics and data analysis.