If you post your data, it might actually be good for you!

A 2013 study by Dr. Heather Piwowar and biologist Dr. Todd Vision suggests that sharing data isn’t just good for the scientific research community as whole, but for individual publishing authors as well. They found a positive correlation between posting data and increased citations. This video discusses their methods, as well as why I’m not entirely convinced this provides a strong incentive to share data.


Interest in data sharing, while growing, has yet to become a scientific norm. This may be due, in part, to a common belief that the cost of preparing and making data widely available are not worth the benefits. However, an article by Heather Piwowar and Todd Vision, titled “Data reuse and the open data citation advantage,” may provide authors an extra incentive to share data. The authors suggest that, on top of allowing for future investigation of past studies and methods, encouraging multiple perspectives regarding data, identifying errors, and improving publication integrity, sharing data leads to higher citation rates.

There can be challenges, however, to making data open and publicly available. Piwowar and Vision acknowledge that variables can be controlled to predict citation rates, leading to uncertain estimates of “citation benefit[s].”

In their study on gene expression microarray data, Piwowar and Vision looked at citation rates, controlling for citation predictors in order to determine the variability of data reuse. Their methods are described below:

“First, we conduct a small-scale manual review of citation contexts to understand the proportion of citations that are made in the context of data reuse. Second, we use attribution through mentions of data accession numbers, rather than citations, to explore patterns in data reuse on a much larger scale.”

They conducted their analysis using many factors as covariates, including date of publication, open access status, number of authors, author country, study topic, and more. Additionally, they examined patterns of data reuse.

The authors conclude that there is a strong citation benefit from open data, and a “direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data.” They found that the number of citations a paper received is strongly correlated to its publication date. And overall, papers with openly available data received more citations, even after controlling for variables known to affect citation rates.

Piwowar and Vision also list other factors aside from third-party data reuse that may be relevant to “open data citation benefits”:

  1. Data Reuse – Papers with available datasets can be used in ways that papers without data cannot, and they may receive additional citations as a result.
  2. Credibility Signalling – The credibility of research findings may be higher for research papers with available data. Such papers may be preferentially chosen as background citations or the foundation of additional research.
  3. Increased Visibility – Third party researchers may be more likely to encounter a paper with available data, either by a direct link from the data or indirectly through cross-promotion.
  4. Early View – When data is made available before a paper is published, some citations may accrue earlier than they would otherwise because of accelerated awareness of the methods, findings, and so on.
  5. Selection Bias – Authors may be more likely to publish data for papers they judge to be their best quality work, because they are particularly proud or confident of the results.

The obstacles the authors faced while gathering citation data suggests that “improvements in tools and practice are needed to make impact tracking easier and more accurate, for day-to-day analyses as well as studies for evidence-based policy.”

While there are positive and negative incentives to data sharing, the authors ultimately assert that, in the transition from “data not shown” to a culture where published data is normalized, sharing data should be seen as a tenet of science, and science as a public resource.

If you want to dive deeper into the material, you can read the whole paper by clicking on the link in the SEE ALSO section at the bottom of this page.


Reference

Piwowar, Heather A., and Todd J. Vision. 2013. “Data Reuse and the Open Data Citation Advantage.” PeerJ 1 (October): e175. doi:10.7717/peerj.175.