This post was written by Jo Weech, Research Transparency Program Manager at BITSS, and Liz Brown, Staff Scientist at CEGA and lead of CEGA’s cost transparency work.

The global development community has published more than 20,000 impact evaluations across health, education, agriculture, financial inclusion since 1990. However, the ability to learn systematically from these evaluations has been limited: constrained by diverse study designs, fragmented reporting, and uneven capacity and incentives to interpret results. Without systematic and aggregated results, it is difficult to generalize findings from one study to another context, and designing future research with accurate budgets becomes harder still. This lack of aggregated results limits both the impact of individual studies and the value of the overall body of research.
Research papers contain far more usable information than what makes it into abstracts and policy briefs. A single academic paper may include data on program costs, geographic and demographic context, implementation conditions, and multiple welfare outcomes. If this information were systematically extracted and aggregated across studies, it could power meta-analyses, inform cost-effectiveness assessments, and generate insights that no single study can produce independently. This kind of evidence aggregation effort is in progress in the biomedical sciences through the use of Large Language Models (LLMs), but a similar effort has yet to take hold at scale in the social sciences.
CEGA is leading a large-scale effort to make this standardization and aggregation. Together with other impact evaluators and international development researchers, we are working to standardize research reporting, build harmonized costing data sets, and responsibly use AI tools for evidence synthesis. This work builds on promising advances in AI and a lot of hard work–together with other major producers of evidence–to standardize a metadata schema for reporting research results. CEGA is developing workflows to make it more efficient to leverage the wealth of available information contained in a single, published academic paper, including costing data, effect sizes, geographic and demographic context, and implementation conditions, in addition to the contextual information needed to correctly interpret those data. Across three distinct initiatives, we’re creating new tools and reporting frameworks for reporting research impacts and costs in meticulous detail and at scale.
Standardizing Research Reporting
Heterogeneity in research design is important and necessary because research questions inform the selection of appropriate methods to answer said questions. Variation in the choice of a study population, intervention treatment arms, and outcome metrics, all reflect an appropriate adaptation of research to context, and that specificity must remain part of the research process.
Alternatively, heterogeneity in the reporting of research results can be troublesome. Variation in how coefficients, effect sizes, and estimands of the treatment effect are reported slows down the evidence synthesis workflow, as it requires analysts to undertake tedious review before estimates can be correctly extracted, harmonized, and entered into tables for meta-analysis.
This is a solvable problem, but solving it requires changing norms across the research pipeline and for researchers to adopt a common set of reporting standards and data-sharing practices. To lay this foundation, CEGA and 3ie initiated the Sustainable Evidence Infrastructure Commitment (SEIC) to bring together impact evaluation organizations and researchers to build shared standards for research design and reporting. Establishing these standards is a necessary step toward the collaborative infrastructure that evidence aggregation requires. But even well-reported studies are frequently missing costing data.
The Missing Costing Data
Donors make scarce development funding decisions based on cost-effectiveness analyses, which rely on cost data as the first essential input. Yet cost data is available in only roughly 20% of published impact evaluations. This enormous gap in available cost evidence leaves policymakers mostly guessing about the cost-effectiveness of development interventions. A program could show strong effects on key outcomes but be far more or less cost-effective than a comparable alternative. Without cost data, it is impossible to know which would produce the most impact per dollar.
CEGA’s Cost Effectiveness program works to fill this evidence gap by establishing reporting standards for prospective studies and by developing tools to help researchers capture cost data systematically including through a costing pre-analysis plan. In addition, CEGA has developed a method of cost harmonization and predictive modeling that makes more efficient use of the available cost information reported in studies when cost evidence is sparsely reported.
As researchers adapt and use these cost reporting standards, CEGA will be able both to model and predict costs for a larger number of studies in meta-analysis and contribute these estimates to publicly-available data sets.
Using AI to Unlock Evidence Aggregation at Scale
Meta-analysis is the gold standard for generating the highest quality evidence of impact. By pooling effect size estimates across multiple studies, meta-analyses generate more statistically robust and generalizable conclusions than any single evaluation can provide. But conducting a rigorous meta-analysis remains resource-intensive. Extracting effect size data from dozens or hundreds of papers — each with different structures, outcome definitions, and reporting formats — requires substantial time and labor. The resources required present a barrier to how often meta-analyses get done and how quickly the field can synthesize accumulated evidence.
CEGA is working to lower that barrier. Alongside the Global Poverty Research Lab (GPRL) at Northwestern University, CEGA is leveraging the metadata schema, co-developed with the World Bank, AidGrade, and other partners, to automate data extraction with LLMs. This automated tool reduces the resources required to conduct a meta-analysis and makes it possible to construct large, open access datasets of outcomes from social science papers.
Individually, each of these three efforts addresses a specific gap in the development evidence pipeline. Together, they address different aspects of the research process — study design, results reporting, and evidence aggregation and synthesis — that must all improve in order to fully leverage the insights of existing international development research.