by Jennifer Sturdy, BITSS
Communalism, organized skepticism, disinterestedness, and universalism. These scientific norms guide BITSS as an advocate for research transparency. From our perspective, research transparency is the means by which the research ecosystem operationalizes these scientific norms; and fully transparent research demonstrates when these norms are adhered to, and when they are not.
Respect for persons, beneficence, and justice. These ethical principles guide regulations for research involving human subjects. And when BITSS advocates for research transparency, we do so alongside these principles. On Monday, January 21, 2019, major revisions to the US regulations that operationalize these three principles went into effect with the “Revised Common Rule” aka subpart A of 45 CFR part 46; aka Basic Health and Human Services (HHS) Policy for Protection of Human Research Subjects.
We think it’s worth discussing the Revised Common Rule as it has implications for an issue BITSS cares about – research data (and code) transparency. On the one hand, some of the revisions help to facilitate greater research transparency. On the other hand, without formally considering research transparency and reproducibility alongside the three core ethical principles, some issues remain unaddressed. With this in mind, we think it’s worth asking – what if research transparency and reproducibility are considered ethical principles alongside respect for persons, beneficence, and justice?
Tension between research transparency and reproducibility and ethics
Since our early years, BITSS has tried to balance tension between research transparency and reproducibility, the ethical principles that underpin research regulation, and data protection laws (such as the Privacy Act in the US and now the GDPR in the EU).
A prime example of this tension has to do with research data (and code) transparency. For some research, this is a non-issue because the research does not rely on personally identifiable and/or sensitive data on human subjects. But for other research – particularly involving extensive household, school, facility, or other institutional-level surveys – the promise of confidentiality to the human subjects involved serves as an appropriate barrier to sharing the raw data (and code) underlying the analysis.
Depending on whether or not analysis requires use of identifiable data, researchers may be bound by their ethical principles (and regulations) not to share the raw data (and code) underlying their analysis. (A hot topic indeed – recent discussions regarding the US Census and differential privacy provide some insight on this data sharing challenge).
Mitigating (some of) this tension through training and existing regulation
- The reason(s) for data sharing needs to inform data access requirements. For research transparency, there are many reasons to share data: computational reproducibility, extended replication, and new, innovative research questions that were not considered at the time of the original study. But these reasons don’t always require a one-size-fits-all data access strategy.
- The research ecosystem should consider research data (and code) transparency on a spectrum. Access may be direct, limited, or restricted depending on the requirements of the study, reasons for data sharing, and promises of confidentiality.
- Computational reproducibility should be a condition for any analysis. Researchers therefore must “think early and often” about reproducible workflow. The workflow should be built in a way that maximizes computational reproducibility, even when the data can only be accessible through restricted-access. This also means that researchers should provide a public description of the steps that others should take to request access.
- For research that engages with human subjects, research data (and code) accessibility should be defined through the informed consent process.
The Revised Common Rule now dovetails with some of the above, at least for research involving human subjects:
- Researchers governed by the Common Rule must be explicit with human subjects during informed consent whether or not de-identified data will be shared in the future; and
- Regulation now allows researchers to seek broad consent for future sharing of identifiable data for unknown, unspecified research purposes.
Both of these changes are important for research transparency, but of course tension between transparency and ethics persists.
A Common Rule that includes Research Transparency and Reproducibility
When BITSS supports institutions in scaling-up research transparency and reproducibility practices, we offer the following suggestions for considering transparency, reproducibility, and ethical principles alongside each other within research guidelines and protocols:
- Researchers should seek informed consent for identifiable data sharing for computational reproducibility as explicitly as for de-identified data sharing. BITSS believes computational reproducibility should be considered a condition for any research analysis. Facilitating access to identifiable data (whether through direct, limited, or restricted access) for the purpose of independent computational reproducibility should be built into the informed consent process as explicitly as de-identifiable data sharing is built into the Revised Common Rule. If not made explicit, researchers might not facilitate this access, or might rely on broad consent to facilitate access to it.
- When researchers seek and obtain broad consent, it should come with careful consideration of evolving scope and risks, particularly when used for more transparency. While seeking broad consent at the informed consent stage may facilitate greater data sharing, researchers must still balance the goal of data sharing with “do no harm” to the human subjects. There are two main concerns with seeking and obtaining broad consent as a means for data sharing: First, risks may change over time, and therefore consent to participate in research is obtained without knowledge of future risks. Second, the research purpose and scope may have significant influence on human subjects’ decisions to share data. For this reason, even if broad consent is obtained for data sharing, researchers should still make careful choices regarding use of direct, limited, or restricted data access, and if re-consent is needed.
- Data availability (aka data transparency) shouldn’t remove the need for informed consent. Yes, BITSS likes open data, and Big Data, and more data, but transparency without respect for persons isn’t our goal. As the line between human subjects and their data gets fuzzier with more available data – through apps, machine learning, etc. – the issue of informed consent remains relevant. Some of this type of data is made widely available through a combination of lack of knowledge (i.e. most people may not know just how much data/information they are sharing and for what research purpose it can be used, such as psychographic modeling) and trust (i.e. most people may trust they are personally protected when sharing data with one user, not realizing the data is shared with another user). Without careful consideration of informed consent and respect for persons, mistrust and subsequent departures in this sort of data sharing are sure to follow.
Since the Revised Common Rule reflects
the largest effort to update the regulation since 1991, it’s unlikely the
remaining tension will be directly addressed in the Common Rule anytime soon.
But, we hope to keep the conversation going with our community, and to continue
integrating transparency and reproducibility into the research ecosystem while
carefully considering it alongside ethical principles of respect for persons,
beneficence, and justice. For more on this, stay tuned for our upcoming post on
our support to the Inter-American Development Bank as they look to scale-up
Transparent, Reproducible, and Ethical Research.
 These are the Mertonian norms described in Merton, Robert (1942) “A Note on Science and Democracy,” Journal of Legal and Political Sociology: 1: 115-26. Reprinted as “Science and Democratic Social Structure” in Social Theory and Social Structure ([1949, 1957]1968), pp. 604-615.
 Often this is because stripping away direct identifiers (such as names, addresses, etc.) is insufficient to protect confidentiality. The data may also include indirect identifiers that are visible or well-known about the human subjects involved. To sufficiently de-identify the data for public access while also adhering to promises of confidentiality, the researcher may need to remove and/or permutate variables in a way that alters the original, raw data.
 Fully de-identified data may be shared publicly to facilitate new and innovative research analysis, but computational reproducibility may require access to the fully identifiable data, which may only be shared on a limited or very restricted basis.
 Direct access means public, direct download data available to new users. Limited access introduces some minimal barrier to data access, such as submitting a proposal for access and signing a Data Use Agreement. Restricted-access introduces higher barriers to data access, and may include access to the data only through a protected data enclave. Examples of restricted-access are found at ICPSR – https://www.icpsr.umich.edu/icpsrweb/content/ICPSR/access/restricted/
 46.116 General requirements for informed consent (b) (9) One of the following statements about any research that involves the collection of identifiable private information or identifiable biospecimens: (i) A statement that identifiers might be removed from the identifiable private information or identifiable biospecimens and that, after such removal, the information or biospecimens could be used for future research studies or distributed to another investigator for future research studies without additional informed consent from the subject or the legally authorized representative, if this might be a possibility; or (ii) A statement that the subject’s information or biospecimens collected as part of the research, even if identifiers are removed, will not be used or distributed for future research studies
 46.116 General requirements for informed consent (d): Broad consent for the storage, maintenance, and secondary research use of identifiable private information or identifiable biospecimens (collected for either research studies other than the proposed research or non-research purposes) is permitted as an alternative to the informed consent requirements in paragraphs (b) and (c) of this section….(5) Unless the subject or legally authorized representative will be provided details about specific research studies, a statement that they will not be informed of the details of any specific research studies that might be conducted using the subject’s identifiable private information or identifiable biospecimens, including the purposes of the research, and that they might have chosen not to consent to some of those specific research studies.
 For an example of this discussion, please reference Knott, Eleanor (Forthcoming) ‘Beyond the Field: Ethics after Fieldwork in Politically Dynamic Contexts’, Perspectives on Politics. (conditionally accepted) Accessed on February 1, 2019 (http://eprints.lse.ac.uk/87445/1/Knott_Beyond%20the%20Field.pdf)
 The case of use of Havasupai tribe data beyond the original research scope without consent is a good example of this risk. For more information, reference Sterling, Robyn. Genetic Research among Havasupai: A Cautionary Tale. Virtual Mentor. 2011;13(2):113-117. DOI: 10.1001/virtualmentor.2011.13.2.hlaw1-1102