Demand for evidence-based policy has grown significantly in the last decade. Government data systems play a critical role in making linked, statewide administrative data available so researchers can produce findings that shape education and areas of public interest.
Research varies in how open and accessible it is. The incentives, norms, and institutions that govern social science don’t always reward transparency and publishing outputs like data, code, or the research itself, particularly when the findings aren’t statistically significant. Restricted-access data offers a unique opportunity to document and incentivize transparency. When government entities provide access to administrative data in a controlled environment, there is the opportunity to have a time stamped record of when specific data was made available for a specific use. Together, academic institutions and government agencies can normalize making data and the inputs into any kind of analysis public so others can verify, build upon, and trust their findings through open science.
WHAT IS OPEN SCIENCE?
It is a set of norms, practices, and tools that make research more transparent, reproducible, rigorous, and ethical. It’s not a single requirement, but rather a set of methods that can vary by discipline. General practices include:
- Pre-registration: Researchers document their research questions and methods before accessing data. This reduces the risk that findings are shaped by what data happens to show or what researchers want to find.
- Pre-analysis plans (PAPs): A detailed plan submitted before analysis begins. It specifies how data will be analyzed, preventing the cherry-picking of data and also setting up the research team for success. A detailed PAP can simplify data analysis post collection and make it easier to prepare findings for publication.
- Data documentation and codebooks: Clear documentation of how data were generated and what each variable means helps organize data and makes it accessible for other researchers.
- Version control and data sharing: Using platforms like GitHub or the Open Science Framework (OSF) makes it easier to collaborate on shared code, track coding changes, and prepare clean code for publication.
- Transparent reporting and disclosure: Researchers clearly report data limitations, conflicts of interest, and analytical decisions to make it easier for others to reproduce or replicate their work.
- Replication and Reproduction: Reproducing a study means re-analyzing the original data and code to verify the findings. Replicating a study means answering the same research question with new data. Both practices are important tools to make sure that research findings hold up.
OPEN SCIENCE AND ADMINISTRATIVE DATA
Government entities providing access to data in controlled environments can support open science practices by creating infrastructure that enables time and date-stamped records indicating when specific data was made available for research projects. This can strengthen both the scientific integrity of the research and increase trust in the neutrality and transparency of the data access processes.