Artificial Intelligence for Enhancing Data Quality, Standardization & Integration
Challenge
Informed policy decisions require high quality data, but many data sources are inconsistent, incomplete, or difficult to use.
Data sources must go through an array of assessment, processing, and standardization steps before they can easily and accurately be used for analysis. While these activities are necessary even for structured survey data, they are even more extensive for nontraditional data sources such as administrative records, geospatial data, or sensor data.
The 2022 CHIPS and Science Act authorized a five-year demonstration project to explore the implementation of a future National Secure Data Service (NSDS), which will support evidence-based decision-making by improving the access and usability of federal, state, and local government data assets. This project is part of the National Secure Data Service roadmap for developing a future toolkit that looks to promote the development of high-quality data. This toolkit can potentially streamline data preparation activities for the federal statistical system and promote the development of high-quality data.
Solution
NORC is exploring innovative applications of artificial intelligence to reduce the resources required to create high-quality data.
NORC’s solution begins with identifying the most promising areas for AI to streamline data preparation activities. We draw from interviews with federal statistical experts and subject matter experts in data quality, privacy, and ethics, as well as the literature and our experience preparing high-quality data sources. Our assessment activities will include considering types of data that require preparation, the challenges that need to be addressed with those data sources, existing tools, and relevant ethical or privacy concerns.
Result
Our toolkit will increase the quality and scope of data available to build evidence and support decision-making.
After identifying high-priority use cases for automation to support data standardization, integration, and quality, we are building, documenting, and packaging a toolkit that addresses these needs. Our toolkit will improve the accessibility and quality of data sources, especially non-traditional sources such as administrative records or geospatial data, which hold particular potential to shed new light on important policy questions and evidence gaps.
About the Project
The National Center for Science and Engineering Statistics (NCSES) has awarded $746,750 to NORC to support this effort. The funding is provided through America's DataHub Consortium's Other Arrangement. (Period of Performance: September 2024 – April 2026)
Related Tags
Project Leads
-
Emily R. Wiegand
Senior Data ScientistProject Director -
Beth Fisher
Senior Research Director ISenior Staff -
Zachary Seeskin
Senior StatisticianSenior Staff -
Mehmet Celepkolu
Senior Data ScientistSenior Staff -
Nola du Toit
Senior Research Methodologist & Data Visualization SpecialistSenior Staff