Skip to main content

Linking Parent & Statistical Agency Data

Padlock on top of a computer keyboard
Linking NCSES, SED, and NSF PI data to inform future linkages between a statistical agency and its parent agency
  • Client
    National Center for Science and Engineering Statistics within NSF
  • Dates
    2023 – 2025

Problem

The National Center for Science and Engineering Statistics wanted to demonstrate the process of using privacy protected record linkage to link two data sources.

Under the 2022 CHIPS and Science Act, the National Secure Data Service Demonstration (NSDS-D) project is an informed effort to strengthen data linkage and data access infrastructure. The National Center for Engineering Statistics (NCSES) within the National Science Foundation (NSF) wanted to conduct a research demonstration project to develop a data sharing agreement, use a privacy-protecting record linkage tool to link two disparate sources, and create an analytic dataset that can be used to answer questions that could not be answered with either source alone.

There are certain nuances related to linking data between a federal statistical agency and its parent agency. Additionally, there are numerous existing privacy protected record linkage (PPRL) tools with different benefits based on the nature of the linkage and data to be linked. NCSES within NSF wanted to demonstrate a linkage in order to develop guidance and lessons learned for future linkage between a federal statistical agency and its parent agency.

Solution

NORC linked NCSES SED and NSF PI data using an open source PPRL tool. 

To demonstrate a linkage between a statistical agency and its parent agency, NORC linked the following two data sources:

  • NCSES Survey of Earned Doctorates (SED)
  • NSF Principal Investigator (PI) award data

Working closely with NCSES and NSF staff, NORC developed a data sharing agreement while documenting and highlighting the required considerations of developing such an agreement specifically between a statistical agency and its parent agency. As part of developing the agreement, a process flow was created to present a suggested infrastructure that identifies responsibilities for data ownership, storage, processing, and linking. This infrastructure ensures the ability to conduct PPRL to link sources without ever exchanging direct personally identifiable information (PII).

NORC also considered both open-source and commercial PPRL software options to identify the types of considerations and precautions that should be taken when selecting a software for linkage such as strengths or limitations in the capabilities of a particular software based on the available PII in source data.

Result

This project is currently in the process of developing a recommended linkage strategy and guidance.

A Data Sharing Agreement as well as guidance on selecting an appropriate PPRL tool and linkage strategy are delivered throughout the project lifecycle. A final methodology report will detail the specific PPRL tool selection and methods for linking NCSES SED and NSF PI data as well as guidance and lessons learned to inform future linkages. A statistical analysis report will detail specific analyses on the final linked SED-PI data as well as what is needed to assess the feasibility of analyzing linked data in a secure environment to support evidence-based policymaking.

Project Leads

Explore NORC Research Science Projects

Analyzing Parent Narratives to Create Parent Gauge™

Helping Head Start build a tool to assess parent, family, and community engagement

Client:

National Head Start Association, Ford Foundation, Rainin Foundation, Region V Head Start Association

America in One Room

A “deliberative polling” experiment to bridge American partisanship

Client:

Stanford University