Skip to main content

Artificial Intelligence for Enhancing Data Quality, Standardization & Integration

Two people looking at programming code on a computer screen in an office
Applying innovative data science methods to create datasets that support evidence-based decision-making
  • Client
    National Center for Science and Engineering Statistics
  • Dates
    2024 – 2026

Challenge

Informed policy decisions require high quality data, but many data sources are inconsistent, incomplete, or difficult to use.

Data sources must go through an array of assessment, processing, and standardization steps before they can easily and accurately be used for analysis. While these activities are necessary even for structured survey data, they are even more extensive for nontraditional data sources such as administrative records, geospatial data, or sensor data.

The Advisory Committee on Data for Evidence Building (ACDEB) and the 2022 CHIPS and Science Act provide authority and recommendations for developing a future National Secure Data Service (NSDS), which will support evidence-based decision-making through improving access and usability of federal, state, and local government data assets. This demonstration project is part of the National Secure Data Service roadmap for developing a future shared service that can promote the development of high-quality data. The National Secure Data Service Demonstration (NSDS-D) envisions a future shared service that can streamline data preparation activities for the federal statistical system and promote the development of high-quality data. 

Solution

NORC is exploring innovative applications of artificial intelligence to reduce the resources required to create high-quality data.

NORC’s solution begins with identifying the most promising areas for AI to streamline data preparation activities. We draw from interviews with federal statistical experts and subject matter experts in data quality, privacy, and ethics, as well as the literature and our experience preparing high-quality data sources. Our assessment activities will include considering types of data that require preparation, the challenges that need to be addressed with those data sources, existing tools, and relevant ethical or privacy concerns. 

Result

Our toolkit will increase the quality and scope of data available to build evidence and support decision-making.

After identifying high-priority use cases for automation to support data standardization, integration, and quality, we are building, documenting, and packaging a toolkit that addresses these needs. Our toolkit will improve the accessibility and quality of data sources, especially non-traditional sources such as administrative records or geospatial data, which hold particular potential to shed new light on important policy questions and evidence gaps.

Project Leads

Explore NORC Research Science Projects

Analyzing Parent Narratives to Create Parent Gauge™

Helping Head Start build a tool to assess parent, family, and community engagement

Client:

National Head Start Association, Ford Foundation, Rainin Foundation, Region V Head Start Association

America in One Room

A “deliberative polling” experiment to bridge American partisanship

Client:

Stanford University