Linking Federal Health Data While Protecting Privacy
Problem
Combining sets of health records while maintaining privacy is a challenge.
The government collects and stores data on each of the more than 80 million people enrolled in Medicare, Medicaid, and the Children's Health Insurance Program. On its own, this data can help policymakers and practitioners provide better health care. But the data would be even more linked with other records that hold additional personal health information. The problem is that those individual health records contain names, dates of birth, home addresses, and other identifiers that must be kept confidential.
Solution
NORC tested using “hashing” to link records while maintaining privacy.
The National Center for Health Statistics (NCHS) asked NORC at the University of Chicago to conduct a two-step test. The first test was whether datasets could be linked using encrypted identifiers instead of names, a method known as hashing. The second test was whether hashing accurately connected records from the same individual, thereby producing combined sets of data that were trustworthy. We used software from Datavant for the encryption. We then evaluated statistical validity using machine learning to compare the matchups to those in a gold-standard merger we previously had done for NCHS based on personal identification information. Separately, using personal identification information, we linked records between the National Hospital Care Service and the Transformed Medicaid Statistical Information System to provide more insightful data for policy analysis.
Result
We showed data sharing and personal privacy protection are not mutually exclusive.
The project demonstrated that encryption could be employed to reliably and efficiently link datasets while safeguarding people’s privacy. Our use of supervised machine learning was cited in a 2021 peer-reviewed journal article, and the federal government is doing further tests of privacy-preserving linkages.
Related Tags
Project Leads
-
Edward Mulrow
Senior Vice President & DirectorProject Director -
Dean Resnick
Principal Data ScientistPrincipal Investigator -
Chris Cox
Senior FellowSenior Staff -
Scott Campbell
Senior StatisticianSenior Staff