Research Transparency in Brazilian Political and Social Science: A First Look Political ScienceSSMART

George Avelino Scott Desposato

Scott Desposato and George Avelino conducted the first meta-analysis and reproducibility analysis of political science in Brazil using all articles published in the last five years in the three leading Brazilian political science and general social science journals, including the Brazilian Political Science Review, the Revista de Ciência Política, and Dados. They were able to completely replicate only approximately 5% of quantitative articles.

Publications associated with this project:

  • Avelingo, George, and Scott Desposato. “Transparency and Replicability in Brazilian Political and Social Science: A First Look.” Dados, 2021.
  • Avelino, George, and Scott Desposato. “Transparency and Replicability in Brazilian Political and Social Science: A First Look.” BITSS, 2018.

Pre-Analysis Plans: A Stocktaking EconomicsInterdisciplinaryPolitical ScienceSSMART

George Ofosu Daniel Posner

The evidence-based community has championed the public registration of pre-analysis plans (PAPs) as a solution to the problem of research credibility, but without any evidence that PAPs actually bolster the credibility of research. Ofosu and Posner analyze a representative sample of 195 PAPs from the American Economic Association (AEA) and Evidence in Governance and Politics (EGAP) registration platforms to assess whether PAPs are sufficiently clear, precise and comprehensive to be able to achieve their objectives of preventing “fishing” and reducing the scope for post-hoc adjustment of research hypotheses. They also analyze a subset of 93 PAPs from projects that have resulted in publicly available papers to ascertain how faithfully they adhere to their pre-registered specifications and hypotheses. They find significant variation in the extent to which PAPs are accomplishing the goals they were designed to achieve.

Publications associated with this project:


MetaLab: Paving the way for easy-to-use, dynamic, crowdsourced meta-analyses Cognitive ScienceInterdisciplinaryPsychologySSMART

Christina Bergmann Sho Tsuji Molly Lewis Mika Braginsky Page Piccinini Alejandrina Cristia Michael C. Frank

Aggregating data across studies is a central challenge in ensuring cumulative, reproducible science. Meta-analysis is a key statistical tool for this purpose. The goal of this project is to support meta-analysis using MetaLab, an interface and central repository. MetaLab supports collaborative hosting and the creation of dynamic meta-analyses based entirely on open tools. The platform is supported by a community of researchers and students who can become creators, curators, contributors, and/or users of meta-analyses. Currently, MetaLab only hosts meta-analyses in Early Language and Cognitive Development, however, extensions to other subfields of the social sciences are possible.

Publications associated with this project:

  • Bergmann, C., Tsuji, S., Piccinini, P. E., Lewis, M. L., Braginsky, M., Frank, M. C., & Cristia, A. (2018). Promoting Replicability in Developmental Research Through Meta-analyses: Insights From Language Acquisition Research. Child Development89(6), 1996–2009.

Publication Bias and Editorial Statement on Negative Findings EconomicsSSMART

Cristina Blanco-Perez Abel Brodeur

The introduction of confidence at 95 percent or 90 percent has led the academic community to accept more easily starry stories with marginally significant coefficients than starless ones with insignificant coefficients. In February 2015, the editors of eight health economics journals sent out an editorial statement encouraging referees to accept studies that: “have potential scientific and publication merit regardless of whether such studies’ empirical findings do or do not reject null hypotheses that may be specified.”Using a differences-in-differences approach, Blanco-Perez and Brodeur find that the editorial statement decreased the proportion of tests rejecting the null hypothesis by 18 percentage points. This finding suggests that incentives may be aligned to promote more transparent research.

Publications associated with this project:

  • Blanco-Perez, Cristina, and Abel Brodeur. “Publication Bias and Editorial Statement on Negative Findings.” The Economic Journal 130, no. 629 (July 1, 2020): 1226–47.

Developing a Guideline for Reporting Mediation Analyses (AGReMA) in randomized trials and observational studies Public HealthSSMART

Hopin Lee James H. McAuley Steven Kamper Nicolas Henschke Christopher M. Williams

The Guideline for Reporting Mediation Analyses (AGReMA) is an evidence- and consensus-based guideline that provides recommendations for reporting primary and secondary mediation analyses of randomized trials and observational studies. It is designed to assist authors, peer-reviewers, and journal editors to ensure accurate, consistent, and transparent reporting of studies that use mediation analyses.

The AGReMA initiative followed the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) network methodological framework, including a consensus meeting with methodologists, statisticians, clinical trialists, epidemiologists, psychologists, clinical researchers, clinicians, implementation scientists, evidence synthesis experts, representatives of the EQUATOR network, and journal editors. Learn more and find AGReMA checklists at

Publications associated with this project:

  • Lee, Hopin, Aidan G. Cashin, Sarah E. Lamb, Sally Hopewell, Stijn Vansteelandt, Tyler J. VanderWeele, David P. MacKinnon, et al. “A Guideline for Reporting Mediation Analyses of Randomized Trials and Observational Studies: The AGReMA Statement.” JAMA 326, no. 11 (September 21, 2021): 1045–56.
  • Cashin, Aidan G., James H. McAuley, Sarah E. Lamb, Sally Hopewell, Steven J. Kamper, Christopher M. Williams, Nicholas Henschke, and Hopin Lee. “Development of A Guideline for Reporting Mediation Analyses (AGReMA).” BMC Medical Research Methodology 20, no. 1 (February 3, 2020): 19.

A Large-Scale, Interdisciplinary Meta-Analysis on Behavioral Economics Parameters EconomicsSSMART

Colin Camerer Taisuke Imai

Imai, Rutter, and Camerer examine 220 estimates of the present-bias parameter from 28 articles using the Convex Time Budget protocol. The literature shows that people are on average present biased, but the estimates exhibit substantial heterogeneity across studies. There is evidence of modest selective reporting in the direction of overreporting present bias. The primary source of the heterogeneity is the type of reward, either monetary or non-monetary reward, but the effect is weakened after correcting for potential selective reporting. In the studies using the monetary reward, the delay until the issue of the reward associated with the “current” time period is shown to influence the estimates of the present bias parameter.

Publications associated with this project:

  • Imai, Taisuke, Tom A Rutter, and Colin F Camerer. “Meta-Analysis of Present-Bias Estimation Using Convex Time Budgets.” The Economic Journal, no. ueaa115 (September 29, 2020).

Bayesian Evidence Synthesis: New Meta-Analytic Procedures for Statistical Evidence PsychologySSMART

E.J. Wagenmakers Raoul Grasman Quentin F. Gronau Felix Schönbrodt

Eric-Jan Wagenmakers and colleagues developed a suite of meta-analytic techniques for Bayesian evidence synthesis, addressing a series of challenges that currently constrain classical meta-analytic procedures. These techniques include (1) an application of bridge sampling to obtain Bayes factors for random-effects meta-analysis; (2) the computation of Bayes factors for fixed-effect versus random-effects meta-analysis; (3) proposal of an informed prior on study heterogeneity based on a comprehensive literature search; (4) model-averaged evidence across fixed-effect and random-effects meta-analyses, thereby accounting for model-uncertainty; and (5) a proposal for a running power analysis in the field of meta-analysis.

Publications associated with this project:

  • Gronau, Quentin Frederik, Henrik Singmann, and Eric-Jan Wagenmakers. “Bridgesampling: An R Package for Estimating Normalizing Constants.” MetaArXiv, December 7, 2017.
  • Scheibehenne, Benjamin, Quentin F. Gronau, Tahira Jamil, and Eric-Jan Wagenmakers. “Fixed or Random? A Resolution Through Model Averaging: Reply to Carlsson, Schimmack, Williams, and Bürkner (2017).” Psychological Science 28, no. 11 (November 1, 2017): 1698–1701.

Assessing Bias from the (Mis)Use of Covariates: A Meta-Analysis Political ScienceSSMART

Gabriel Lenz Alexander Sahn

Lenz and Sahn examine how often research findings depend on suppression effects, or covariate-induced increases in effect sizes. Researchers generally scrutinize suppression effects as they want reassurance that researchers have a strong explanation for effect size increases, especially when the statistical significance of the key finding depends on them.

They find that 30-40% of observational articles in a leading journal depend on suppression effects for statistical significance. Although suppression effects are of course potentially justifiable — to address suppressor variables — none of the articles justifies or discloses them. These findings may point to a hole in the review process: journals are accepting articles that depend on suppression effects without readers, reviewers, or editors being made aware.

Publications associated with this project:

  • Lenz, Gabriel S., and Alexander Sahn. “Achieving Statistical Significance with Control Variables and Without Transparency.” Political Analysis, (November 2020) ed, 1–14.

Integrated Theoretical Model of Condom Use for Young People in Sub-Saharan Africa PsychologySSMART

Cleo Protogerou Martin Hagger Blair T. Johnson
Cleo Protogerou, Martin Hagger, and Blair Johnson developed and tested an integrated theory of the determinants of condom use among young people in Sub-Saharan Africa using meta-analytic path analysis. The meta-analysis encompassed fifty-five studies (N = 55,069), comprising 72 independent data sets, and representing thirteen Sub-Saharan African nations. Their analysis revealed significant direct and positive effects of attitudes, norms, control, and risk perceptions on condom use intentions, and of intention and control on condom use. They also found negative effects of perceived barriers on use. In conclusion, their integrated theory provides an evidence-based framework to study antecedents of condom use in Sub-Saharan African youth and to develop targets for effective condom promotion interventions.
Publications associated with this project:
  • Protogerou, Cleo, Blair T. Johnson, and Martin S. Hagger. “An Integrated Model of Condom Use in Sub-Saharan African Youth: A Meta-Analysis.” Health Psychology 37, no. 6 (2018): 586–602.

Welfare Comparisons Across Expenditure Surveys EconomicsSSMART

Elliott Collins Ethan Ligon Reajul Chowdhury

This project aims to replicate and combine three recent experiments on capital transfers to poor households in two distinct phases. The first phase will produce three concise internal replications. These will be accompanied by a report detailing the challenges faced and tools and methods used. The final goal will be to produce practical insights useful to students and new researchers. The second phase will combine these experiments in an extended analysis to explore how economic theory can allow for meta-analysis and comparative impact evaluation among datasets that would otherwise be problematic or even impossible to compare.

Find the most recent version of this paper here.

Investigation of Data Sharing Attitudes in the Context of a Meta-Analysis PsychologyPublic HealthPublic PolicySSMART

Joshua Polanin Mary Terzian

In this study, Joshua Polanin and Mary Terzian provide an explanation of why primary study authors are unwilling to share their data and evaluate whether sending a data-sharing agreement affects participants’ willingness to share IPD. They sampled and surveyed more than 700 researchers whose studies had been included in recently published meta-analyses, splitting the sample into control and treatment groups and using a hypothetical data-sharing agreement as an intervention. Participants who received a data-sharing agreement were more willing to share their data set, compared with control participants, even after controlling for demographics and pretest values (d = 0.65, 95% CI [0.39, 0.90]). A member of the control group is 24% more likely to share her data set should she receive the data-sharing agreement

These findings shed light on data-sharing practices, attitudes, and concerns and can be used to inform future meta-analysis projects seeking to collect IPD, as well as the field at large.

Publications associated with this project:

  • Polanin, Joshua R., and Mary Terzian. “A Data-Sharing Agreement Helps to Increase Researchers’ Willingness to Share Primary Data: Results from a Randomized Controlled Trial.” Journal of Clinical Epidemiology 106 (February 1, 2019): 60–69.

Will Knowledge about More Efficient Study Designs Increase the Willingness to Pre-Register? InterdisciplinarySSMART

Daniel Lakens

Pre-registration is a straightforward way to make science more transparent, and control Type 1 error rates. Pre-registration is often presented as beneficial for science in general, but rarely as a practice that leads to immediate individual benefits for researchers. One benefit of pre-registered studies is that they allow for non-conventional research designs that are more efficient than conventional designs. For example, by performing one-tailed tests and sequential analyses researchers can perform well-powered studies much more efficiently. In this project, Daniel Lakens examined whether such non-conventional but more efficient designs are considered appropriate by editors under the pre-condition that the analysis plans are pre-registered, and if so, whether researchers are more willing to pre-register their analysis plan to take advantage of the efficiency benefits of non-conventional designs.

Publications associated with this project:

Reporting Guidance for Trial Protocols of Social Science Interventions Social ScienceSSMART

Sean Grant
Protocols improve the reproducibility and accessibility of social science research. Given deficiencies in trial protocol quality, the SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) Statement provides an evidence-based set of items to describe in protocols of clinical trials on biomedical interventions. This project introduces items in the SPIRIT Statement to a social science audience, explaining their application to protocols of social intervention trials. Additional reporting items of relevance to protocols of social intervention trials are also presented. Items, examples, and explanations are derived from the SPIRIT 2013 Statement, other guidance related to reporting of protocols and completed trials, publicly accessible trial registrations and protocols, and the results of an online Delphi process with social intervention trialists. The use of these standards by researchers, journals, trial registries, and educators should increase the transparency of trial protocols and registrations, and thereby increase the reproducibility and utility of social intervention research.

Publications associated with this project:

  • Grant, Sean. “Developing an Intervention Trial Protocol and Statistical Analysis Plan: Reporting Guidance for Behavioral and Social Scientists.” Working Paper. Open Science Framework, August 30, 2019.

Distributed Meta-Analysis: A New Approach for Constructing Collaborative Literature Reviews EconomicsSSMART

Solomon Hsiang James Rising

Scientists and consumers of scientific knowledge can struggle to synthesize the quickly evolving state of empirical research. Even where recent, comprehensive literature reviews and meta-analyses exist, there is frequent disagreement on the criteria for inclusion and the most useful partitions across the studies. To address these problems, we create an online tool for collecting, combining and communicating a wide range of empirical results. The tool provides a collaborative database of statistical parameter estimates, which facilitates the sharing of key results. Scientists engaged in empirical research or in the review of others’ work can input rich descriptions of study parameters and empirically estimated relationships. Consumers of this information can filter the results according to study methodology, the studied population, and other attributes. Across any filtered collection of results, the tool calculates pooled and hierarchically modeled common parameters, as well as the range of variation between studies.

Publications associated with this project:

  • Rising, James, and Solomon Hsiang. “Distributed Meta-Analysis: A New Approach for Constructingcollaborative Literature Reviews.” Open Science Framework, July 8, 2017.

Panel Data and Experimental Design EconomicsSSMART

Fiona Burlig Matt Woerman Louis Preonas

Burlig, Preonas, and Woerman develop a power calculation technique and accompanying software package for experiments that use panel data. They generalize Frison and Pocock (1992) to fully arbitrary error structures, thereby extending McKenzie (2012) to allow for non-constant serial correlation. Using Monte Carlo simulations and real-world panel data, they demonstrate that failing to account for arbitrary serial correlation ex-ante yields experiments that are incorrectly powered under proper inference. By contrast, our “serial-correlation-robust” power calculations achieve correctly powered experiments in both simulated and real data.

Publications associated with this project:

Examining the Reproducibility of Meta-Analyses in Psychology PsychologySSMART

Daniel Lakens Marcel van Assen Farid Anvari Katherine Corker James Grange Heike Gerger Fred Hasselman Jacklyn Koyama Cosima Locher Ian Miller Elizabeth Page-Gould Felix Schönbrodt Amanda Sharples Barbara Spellman Shelly Zhou

Meta-analyses are an important tool to evaluate the literature. It is essential that meta-analyses can easily be reproduced to allow researchers to evaluate the impact of subjective choices on meta-analytic effect sizes, but also to update meta-analyses as new data comes in, or as novel statistical techniques (for example to correct for publication bias) are developed. Research in medicine has revealed meta-analyses often cannot be reproduced. In this project, we examined the reproducibility of meta-analyses in psychology by reproducing twenty published meta-analyses. Reproducing published meta-analyses was surprisingly difficult. 96% of meta-analyses published in 2013-2014 did not adhere to reporting guidelines. A third of these meta-analyses did not contain a table specifying all individual effect sizes. Five of the 20 randomly selected meta-analyses we attempted to reproduce could not be reproduced at all due to lack of access to raw data, no details about the effect sizes extracted from each study, or a lack of information about how effect sizes were coded. In the remaining meta-analyses, differences between the reported and reproduced effect size or sample size were common. We discuss a range of possible improvements, such as more clearly indicating which data were used to calculate an effect size, specifying all individual effect sizes, adding detailed information about equations that are used, and how multiple effect size estimates from the same study are combined, but also sharing raw data retrieved from original authors, or unpublished research reports. This project clearly illustrates there is a lot of room for improvement when it comes to the transparency and reproducibility of published meta-analyses.

Publications associated with this project:

  • Lakens, Daniel, Elizabeth Page-Gould, Marcel A. L. M. van Assen, Bobbie Spellman, Felix Schönbrodt, Fred Hasselman, Katherine S. Corker, et al. “Examining the Reproducibility of Meta-Analyses in Psychology: A Preliminary Report.” Working Paper. MetaArXiv, March 31, 2017.

Aggregating Distributional Treatment Effects: A Bayesian Hierarchical Analysis of the Microcredit Literature EconomicsSSMART

Rachael Meager

This paper develops methods to aggregate evidence on distributional treatment effects from multiple studies conducted in different settings and applies them to the microcredit literature. Several randomized trials of expanding access to microcredit found substantial effects on the tails of household outcome distributions, but the extent to which these findings generalize to future settings was not known. Aggregating the evidence on sets of quantile effects poses additional challenges relative to average effects because distributional effects must imply monotonic quantiles and pass information across quantiles. Using a Bayesian hierarchical framework, I develop new models to aggregate distributional effects and assess their generalizability. For continuous outcome variables, the methodological challenges are addressed by applying transforms to the unknown parameters. For partially discrete variables such as business profits, I use contextual economic knowledge to build tailored parametric aggregation models. I find generalizable evidence that microcredit has a negligible impact on the distribution of various household outcomes below the 75th percentile, but above this point, there is no generalizable prediction. Thus, there is strong evidence that microcredit typically does not lead to worse outcomes at the group level, but no generalizable evidence on whether it improves group outcomes. Households with previous business experience account for the majority of the impact in the tails and see large increases in the upper tail of the consumption distribution in particular.

Publications associated with this project:

  • Meager, Rachael. “Aggregating Distributional Treatment Effects: A Bayesian Hierarchical Analysis of the Microcredit Literature.” Working Paper. Open Science Framework, September 1, 2017.

Using P-Curve to Assess Evidentiary Value of Social Psychology Publications PsychologySSMART

Leif Nelson

The proposed project will utilize p-curve, a new meta-analytic tool to assess the evidentiary value of studies from social psychology and behavioral marketing. P-curves differs from meta-analytic methods by analyzing the distribution of p-values to determine the likelihood that a study provides evidence for the existence of an effect; in the event that there is not evidentiary value in a study, p-curve can also determine whether a study is powered such that it would detect an effect 33% of the time, given it exists. We will apply p-curve to each empirical paper in the first issue of 2014 in three top-tier journals: Psychological Science, The Journal of Personality and Social Psychology, and The Journal of Consumer Research. Additionally, we will conduct a direct replication of one study from each of these issues.

Find the most recent version of this paper here.

External Validity in U.S. Education Research EconomicsEducationPublic PolicySSMART

Sean Tanner

As methods for internal validity improve, methodological concerns have shifted toward assessing how well the research community can extrapolate from individual studies. Under recent federal granting initiatives, over $1 billion has been awarded to education programs that have been validated by a single randomized or natural experiment. If these experiments have weak external validity, scientific advancement is delayed and federal education funding might be squandered. By analyzing trials clustered within interventions, this research describes how well a single study’s results are predicted by additional studies of the same intervention in addition to analyzing how well study samples match the target populations of interventions. I find that U.S. education trials are conducted on samples of students who are systematically less white and more socioeconomically disadvantaged that the overall student population. Moreover, I find that effect sizes tend to decay in the second and third trials of interventions.

Publications associated with this project:

  • Tanner, Sean. “External Validity in U.S. Education Research.” Working Paper. Open Science Framework, November 1, 2017.

Publication Bias in Meta-Analyses from Psychology and Medicine: A Meta-Meta-Analysis Life SciencesMedicineMetascience (Methods and Archival Science)PsychologySSMART

Robbie van Aert Jelte Wicherts Marcel van Assen

Publication bias is a substantial problem for the credibility of research in general and of meta-analyses in particular, as it yields overestimated effects and may suggest the existence of non-existing effects. Although there is consensus that publication bias exists, how strongly it affects different scientific literatures is currently less well-known. We examined evidence of publication bias in a large-scale data set of primary studies that were included in 83 meta-analyses published in Psychological Bulletin (representing meta-analyses from psychology) and 499 systematic reviews from the Cochrane Database of Systematic Reviews (CDSR; representing meta-analyses from medicine). Publication bias was assessed on all homogeneous subsets (3.8% of all subsets of meta-analyses published in Psychological Bulletin) of primary studies included in meta-analyses because publication bias methods do not have good statistical properties if the true effect size is heterogeneous. Publication bias tests did not reveal evidence for bias in the homogeneous subsets. Overestimation was minimal but statistically significant, providing evidence of publication bias that appeared to be similar in both fields. However, a Monte-Carlo simulation study revealed that the creation of homogeneous subsets resulted in challenging conditions for publication bias methods since the number of effect sizes in a subset was rather small (median number of effect sizes equaled 6). Our findings are in line with, in its most extreme case, publication bias ranging from no bias until only 5% statistically nonsignificant effect sizes being published. These and other findings, in combination with the small percentages of statistically significant primary effect sizes (28.9% and 18.9% for subsets published in Psychological Bulletin and CDSR), led to the conclusion that evidence for publication bias in the studied homogeneous subsets is weak but suggestive of mild publication bias in both psychology and medicine.

Publications associated with this project:

  • Aert, Robbie C. M. van, Jelte M. Wicherts, and Marcel A. L. M. van Assen. “Publication Bias Examined in Meta-Analyses from Psychology and Medicine: A Meta-Meta-Analysis.” PLOS ONE 14, no. 4 (April 12, 2019): e0215052.

Open Science & Development Engineering: Evidence to Inform Improved Replication Models EconomicsEngineering and Computer ScienceSSMART

Paul Gertler

Replication is essential for building confidence in research studies, yet it is still the exception rather than the rule. That is not necessarily because funding is unavailable — it is because the current system makes original authors and replicators antagonists. Focusing on the fields of economics, political science, sociology, and psychology, in which ready access to raw data and software code are crucial to replication efforts, we survey deficiencies in the current system.

To see how often the posted data and code could readily replicate original results, we attempted to recreate the tables and figures of a number of papers using the code and data provided by authors. Of 415 articles published in 9 leading economics journals in May 2016, 203 were empirical papers that did not contain proprietary or otherwise restricted data. We were able to replicate only a small minority of these papers. Overall, of the 203 studies, 76% published at least one of the 4 files required for replication: the raw data used in the study (32%); the final estimation data set produced after data cleaning and variable manipulation (60%); the data-manipulation code used to convert the raw data to the estimation data (42%, but only 16% had both raw data and usable code that ran); and the estimation code used to produce the final tables and figures (72%). The estimation code was the file most frequently provided. But it ran in only 40% of these cases. We were able to produce final tables and figures from estimation data in only 37% of the studies analyzed. And in only 14% of 203 studies could we do the same starting from the raw data.

We propose reforms that can both encourage and reinforce better behavior — a system in which authors feel that replication of software code is both probable and fair, and in which less time and effort is required for replication.

Publications associated with this project:

How Often Should We Believe Positive Results? Assessing the Credibility of Research Findings in Development Economics EconomicsSSMART

Eva Vivalt Aidan Coville

Under-powered studies combined with low prior beliefs about intervention effects increase the chances that a positive result is overstated. We collect prior beliefs about intervention impacts from 125 experts to estimate the false positive and false negative report probabilities (FPRP and FNRP) as well as Type S (sign) and Type M (magnitude) errors for studies in development economics. We find that the large majority of studies in our sample are generally credible. We discuss how more systematic collection and use of prior expectations could help improve the literature.

Publications associated with this project:

  • Coville, Aidan, and Eva Vivalt. “How Often Should We Believe Positive Results? Assessing the Credibility of Research Findings in Development Economics.” MetaArXiv, August 13, 2017.