Reproducibility of Research: Issues and Proposed Remedies – A Sackler Colloquium Reflection – Berkeley Initiative for Transparency in the Social Sciences

Guest post: Cynthia M. Kroeger, Postdoctoral Fellow, Nutrition Obesity Research Center, University of Alabama at Birmingham

From March 8-10, 2017, I had the honor of joining scientific researchers, publishers, journalists, entrepreneurs, and funders from all over the world to discuss Reproducibility of Research: Issues and Proposed Remedies. This Sackler colloquium of the National Academy of Sciences was organized by Academy members, David B. Allison, Richard Shiffrin, and Victoria Stodden, and dedicated to their late co-organizer, Stephen E. Fienberg, who long-ago inspired reproducibility projects still active by some investigators herein. Sackler colloquia typically unite disciplines to address international currents of scientific priority, and I must say, I was particularly impressed by the degree of interdisciplinarity and rigorous research represented at this meeting. As a BITSS Catalyst with a multidisciplinary background, I felt inspired to extract motifs from this collective of expertise as potential avenues for implementation and reflect on speakers’ discussion points that stood out to me as key.

To summarize generally, the following meta-level themes seemed to materialize as the meeting progressed:

Evidence. Obtaining evidence to substantiate hypotheses is our goal, which relies on a logically valid statistical system. Understanding the difference between belief and degree of certainty seems to be key.
Standards. A universal, authoritative platform that provides easy access to substantiated standards for all steps of the research process may improve understanding of methods and quickly inform decision making.
Communication. To reduce vicious and ad hominem attacks from peers and the public (e.g., in response to potential conflicts, innocent errors, irreproducibility, null findings, competing interests, and differing opinions), encourage constructive debate about topics deemed valuable, and empower researchers to advance science creatively, we perhaps need a cultural understanding of, language for, and education on:
- Probability and statistics: Where information comes from and how certainty is limited
- Cognitive neuroscience: Ways our cognitive judgement and decision making processes are biased
- Formal logic: Ways arguments are sound versus invalid
- Ethics: What values modern standards are intended to uphold and those we hold ourselves
- Psychobiology: Ways we respond to threat, how to create safety, and how to act responsibly
Interdisciplinarity. “More and more, the unit of comprehension is going to be group comprehension, where you simply have to rely on a team because you can’t understand it all yourself.” – Daniel Dennett

With remaining points below, I summarize and react to comments made by speakers that resonated with me and thus, may reflect my experience and understanding more than the scope of their talks or intent. Full presentations can be found on the Sackler Colloquia YouTube Channel, and resulting papers will be published in PNAS.

Day one focused on issues that challenge reproducibility.

Following welcoming remarks by David Allison, Marcia McNutt, and Robert Groves, Victoria Stodden encouraged us to help foster productive discourse by framing issues with precise vocabulary. She differentiated empirical, statistical, and computational reproducibility, for instance, by distinguishing “independently re-implementing an experiment” from “re-running original code on the same data” to get the same result, which effectively primed my consideration for respective stakeholders, issues, and goals.
Kay Dickersin highlighted how systematic reviews best inform evidence-based practice, yet estimations of certainty depend on whether core outcomes are measured and specified completely in primary studies.
David Allison distinguished invalidating errors from biased processes (e.g., incorrect analyses of data vs p-hacking) and described how sharing data and collaborating with methodologists make possible the correction of errors in publications. This made me think about who ultimately bears responsibility for quality control and how such practices may, perhaps appropriately, shift responsibility to original authors.
Madhu Mazumdar discussed how the Continuously Learning Health System may help mitigate inefficiencies when generating knowledge from the routine health care system and benefit more parties.
Lehana Thabane shared numerous cases of insufficient reporting in primary science and reiterated the moral and economic imperative of research transparency.
David Madigan elucidated novel methods of obtaining large-scale evidence from real-world data and emphasized the need to reduce bias by moving from craft (i.e., subjective) to systematic methods.
Isabelle Boutron showed how research distortion in mass media often mirrors primary science.
Andrew Brown invited panelists, Inder Verma, Phil Campbell, Kelvin Droegemeier, Veronique Kiermer, and Virginia Barbour, to consider ways to motivate author compliance to standards already in place. Some thought publishing checklists would help hold authors accountable to stating adherence accurately.
The 17^th Annual Sackler Lecture followed, during which Nobel Laureate Randy Schekman urged the scientific community to assess research using the principles of DORA instead of Journal Impact Factor.

Day two focused on remedies.

Victoria Stodden described current efforts to advance infrastructure for the whole tale of science, in ways that would foster efficiency, productivity, discovery, and reproducibility via automation, archiving, and linkage to references and legal details.
Roberta W. Scherer showed how registry integration and repositories helped improve the accessibility of trial information and researchers’ accountability to standards, but there is still much room for growth.
Joachim Vandekerckhove pointed to C. S. Peirce’s rules for hypothesis evaluation and decision rules as means to bolster objectivity when formally quantifying evidence.
Yoav Benjamini called scientists to rigorously assess replicability before deeming a result “replicated” and improve within study replicability by addressing the effect of selection and relevant variability.
Emery Brown described how the understanding of statistical uncertainty is fundamental to the accurate interpretation of science, collectively lacking within scientific and lay person communities, and likely to improve if we integrate a longitudinal curriculum on quantifying certainty into the early education system.
Elizabeth Iorns showed how Science Exchange can expedite the replication process due to their pre-negotiated legal agreements with affiliates and refined workflows for power calculations, communication with original authors, protocol pre-registrations, and reagent validation.
Susan Fiske explained how internalization of evidence and values is a robust agent of behavioral change compared to compliance with incentives under surveillance and conformity to norms of identity groups.
Hilda Bastian described PubMed efforts to encourage a culture of post-publication peer review. Editorial expressions of concern were intended to draw attention to publications that may warrant correction or retraction, have thus far been sparse, inconspicuous, or ineffective, and will be made identifiable in 2017.
Trevor Butterworth explained how Sense About Science USA helps foster accurate communication of scientific evidence by training researchers, journalists, and editors in statistics and speaking scientifically.
Catherine Woteki thought placing firewalls between funding source and data could enhance trust and broaden resources and shared USDA feats of making sponsored publications and datasets open access.
Kathryn Kaiser led a panel discussion by asking panelists to offer advice to early-career meta-researchers. Jeffrey Flier recommended we base grounds for promotion on our ability to openly communicate controversies within our work, and Daniele Fanelli encouraged interest in meta-research by inviting us to challenge the “science in crisis” narrative and consider whether data suggest bias is at its worst, domain-specific, or improving with salience and refined resources.

Day three focused on research goals.

Richard Shiffrin described how remedies may assist some goals while impeding others (e.g., inflammatory rhetoric may help inspire the granting of research funds while eroding the public’s trust in science).
Kathleen Hall Jamieson discussed how the ability of non-experts to interpret science accurately may depend on the precision of language within original research. She had us consider examples like Einstein wishing he had named his Theory of Relativity the Theory of Invariance and how the phrase ‘herd immunity’ may elicit unintentional implications compared to ‘community immunity’
C. K. Gunsalus spoke of the need to improve ethics training and shift focus from results to professional processes (e.g., education on how to dispute professionally, maneuver justified credit, document work, select mentors/mentees, and advance careers sensibly). Solidarity in the room peaked when she asked for a show of hands by those who have had to take online compliance training that had nothing to do with the type of research we do. Addressing contextual needs may require creative plans to document training.
David Moher followed by encouraging us to test interventions on irreproducibility at the journal level in order to obtain evidence of efficacy and inform science policy.
Richard Nakamura showed how the NIH has worked to raise awareness by convening with publishers and strengthen internal validity of proposals by elevating focus on scientific premise and methods.
Brian Nosek described how workflow and storage integration and support via The Open Science Framework can simplify the rigorous implementation of research at each step of the research cycle.
Giovanni Parmigiani illustrated new statistical approaches to reproducibility – specifically, how response adaptive trial designs that include biomarkers and statistical methods that integrate various outcome forms may help maximize trial utility and personalize medicine.
Keith Baggerly encouraged us to challenge mistakes with both persistence and replicable data.
Richard Shiffrin and David Allison closed by speaking with gratitude for the practice of science as a privilege and for efforts that foster its self-correcting process and potential for utility.

One of my favorite books is “Surely You’re Joking, Mr. Feynman!” It’s fun! And it roots value in knowledge. Several speakers quoted the Physics Laureate during their talks, and I’ll follow suit by closing with an excerpt:

The first principle is that you must not fool yourself — and you are the easiest person to fool.
…
After you’ve not fooled yourself, it’s easy not to fool other scientists.
…
I would like to add … that you should not fool the layman when you’re talking as a scientist.
…
I am not trying to tell you what to do … when you’re not trying to be a scientist, but just trying to be an ordinary human being. We’ll leave those problems up to you and your rabbi.

I’m talking about a specific, extra type of integrity that is not lying, but bending over backwards to show how you’re maybe wrong, that you out to have when acting as a scientist.

And this is our responsibility as scientists, certainly to other scientists, and I think to laymen.

— Richard P. Feynman

Berkeley Initiative for
Transparency in the Social Sciences

Reproducibility of Research: Issues and Proposed Remedies – A Sackler Colloquium Reflection

April 3, 2017September 1, 2017

Leave a ReplyCancel reply

Share this:

Leave a ReplyCancel reply