Rules for open data

We know that sharing data improves the integrity of scientific research, helps other researchers who are interested in using others’ data or in replicating their studies, and may even increase the likelihood a paper will be cited. But how can researchers ensure this sharing process is efficient or present their data in a way that is useful? Astronomer Alyssa Goodman and fourteen of her colleagues came up with ten rules for open data. I briefly go through these rules in this video and elaborate on them below.

“Ten Simple Rules for the Care and Feeding of Scientific Data” was written by a group of 15 researchers who wanted to help scientists “ensure that their data and associated analyses continue to be of value and be recognized.” Today, there is an abundance of studies that are less than reproducible or verifiable due to a lack of data availability or data description.

Goodman and her colleagues’ 10 rules are as follows:

Love your data, and help others love it, too. If you make your data easily accessible, others are more likely to do that as well. The authors encourage scientists to “cherish, document, and publish” their data, and encourage others to follow.
Share your data online, with a permanent identifier. Authors should try to deposit their data in an archive that acts as the “go to” place for their field. Having a good host for data allows it to be more accessible and long-lasting.
Conduct science with a particular level of reuse in mind. The authors use the word “provenance” and its definition as “the sum of all the processes, people, and documents involved in generating or otherwise influencing or delivering a piece of information” to describe a study’s level of reusability. With better documentation, quality of provenance will be higher and there will be a higher chance of data reuse. Data reuse is most possible when data, metadata, and information regarding the processes of generating this data, is provided. Thus, scientists should plan according to the level of reuse they want their experiment to have, and adopt the appropriate standard formats.
Publish workflow as context. “Publishing a description of your processing steps offers essential context for interpreting and reusing data” the authors write. Workflow is a term that describes the data collection methods and analysis of a project. While some workflow software exists, it is suggested that, at the minimum, authors should disclose a simple sketch of data flow that indicates how results were generated.
Link your data to your publication as often as possible. Data can include anything from tables and spreadsheets, to images and code. Regardless of what a study’s data is, the more of it and the earlier it is made accessible, the better. Scientists are encouraged to embed citations in their data and code.
Publish your code. Although it may not be perfect, publication of one’s code can be important in the replication and understanding of one’s data.
State how you want to get credit. Goodman et al. simply suggest making known your expectations of how you would like to be acknowledged for your data.
Foster and use data repositories. It is important to find a good place to share data and code. Often, there will be an existing repository within a field. However, if there isn’t, the authors encourage asking information specialists or librarians within that field.
Reward colleagues who share their data properly. Rewarding those who share data and code, and acknowledging colleagues’ good practices, will encourage the continuation and development of these habits.
Be a booster for data science. As scientists, we should all help push their institutions towards better, more reproducible research. And we should advocate for improved data sharing. We should pass on our knowledge to graduate and undergraduate students through classes and workshops so that more will see the value of “well-loved data.”

The whole editorial is worth reading and is, not coincidentally, open access. In addition to these rules, the authors give an extensive list of useful resources including open access repositories and software to help manage a more reproducible workflow.

Reference

Goodman, Alyssa, et al. (2014). “Ten Simple Rules for the Care and Feeding of Scientific Data”, PLoS Computational Biology, 10(4), e1003542.

Previous Lesson

Back to Week 4

Go to Week 5