Skip to main content

Linking Federal Health Data While Protecting Privacy

Stock photo of medical records in Doctors office
Proof that data sets can be combined without disclosing sensitive information
  • Client
    National Center for Health Statistics (NCHS)
  • Dates
    2020 - 2022

Problem

Combining sets of health records while maintaining privacy is a challenge. 

The government collects and stores data on each of the more than 80 million people enrolled in Medicare, Medicaid, and the Children's Health Insurance Program. On its own, this data can help policymakers and practitioners provide better health care. But the data would be even more linked with other records that hold additional personal health information. The problem is that those individual health records contain names, dates of birth, home addresses, and other identifiers that must be kept confidential.

Solution

NORC tested using “hashing” to link records while maintaining privacy.

The National Center for Health Statistics (NCHS) asked NORC at the University of Chicago to conduct a two-step test. The first test was whether datasets could be linked using encrypted identifiers instead of names, a method known as hashing. The second test was whether hashing accurately connected records from the same individual, thereby producing combined sets of data that were trustworthy. We used software from Datavant for the encryption. We then evaluated statistical validity using machine learning to compare the matchups to those in a gold-standard merger we previously had done for NCHS based on personal identification information. Separately, using personal identification information, we linked records between the National Hospital Care Service and the Transformed Medicaid Statistical Information System to provide more insightful data for policy analysis.

Result

We showed data sharing and personal privacy protection are not mutually exclusive. 

The project demonstrated that encryption could be employed to reliably and efficiently link datasets while safeguarding people’s privacy. Our use of supervised machine learning was cited in a 2021 peer-reviewed journal article, and the federal government is doing further tests of privacy-preserving linkages.

Project Leads

“We’ve demonstrated a capability for building privacy-protected files that can be used for evidence-based policy making.”

Principal Data Scientist

“We’ve demonstrated a capability for building privacy-protected files that can be used for evidence-based policy making.”

Explore NORC Health Projects

Adapting and Implementing a Toolkit to Identify Pneumonia in Patients

Adapting and implementing patient safety practices in ambulatory care

Client:

Agency for Healthcare Research and Quality

Adolescent SBIRT Initiative: Preparing the Health Professional Workforce

Integrating SBIRT into Nursing and Social Work Curricula

Client:

Conrad N. Hilton Foundation