Linking National Hospital Care Survey and CMS Data
The National Center for Health Statistics wanted to link several data sources while maintaining privacy.
The National Center for Health Statistics (NCHS) wanted to evaluate commercially available privacy-preserving linkage software to determine if this software could link data with a similar degree of accuracy as non-privacy-preserving methods. NCHS’s interest in PPRL methods is based on their anticipation that for certain proposed linkages, the data custodians may be unable (e.g., because of legal restrictions) or unwilling to share complete files or sensitive data elements. PPRL offers the opportunity to complete linkages under these limitations. NCHS expected that linkage results would be somewhat diminished by limited access to privacy-impacting data elements, but by how much?
NCHS sought to increase the analytic utility of the 2014 and 2016 NHCS data by linking it with T-MSIS claims and enrollment data. These linked data allow researchers to analyze the health and health outcomes of persons enrolled in a means-tested government healthcare program.
NORC provided technical support to the Data Linkage Team at the National Center for Health Statistics (NCHS).
The National Center for Health Statistics (NCHS) asked NORC to:
- Evaluate privacy-preserving record linkage techniques by re-performing the linkage between the National Hospital Care Survey (NHCS, 2016) and the National Death Index (NDI) using encrypted identifiers with Datavant software
- Conduct linkage between NHCS (2014 and 2016) and Centers for Medicare & Medicaid Services (CMS) Transformed Medicaid Statistical Information System (T-MSIS) data containing enrollment and claims data from a range of years
We performed a methodological assessment of Datavant software for conducting Privacy Preserving Record Linkage. The project assessed how the use of hashing algorithms might affect the quality of the linked data and the inference in a secondary analysis of those data (I.e., the accuracy of tabulations and analyses made with linked data from the PPRL approach.
The second part of the project linked NHCS patient records with T-MSIS claims and encounter data, creating a database of detailed health insurance claims data for all NHCS patients receiving health insurance coverage from Medicare and Medicaid, the two largest U.S. public health insurance programs.
NORC’s groundbreaking data linkage provides myriad benefits to health care research.
This new data resource supports patients, caregivers, and providers as they strive to improve health, prevent chronic disease, and improve the efficacy and quality of health care services. The project expanded data capacity for studies of HHS priority issues, particularly among the Medicaid-covered population, such as opioids, obesity, and infectious diseases in a way that no single source alone could provide. It also supported a wide array of health outcomes research studies, such as examining differences in the efficiency and effectiveness of treatment protocols or post-acute care utilization among patients covered by Medicare fee-for-service, Medicare Advantage, and Medicaid programs. The project also provides a rich data source for researchers examining the association between health and housing.
Departments, Centers & Programs
Edward MulrowSenior Vice President & DirectorProject Director
Dean ResnickPrincipal Data ScientistPrincipal Investigator
Chris CoxSenior FellowSenior Staff
Scott CampbellSenior StatisticianSenior Staff
opens in new tab“Measurement of Type I and Type II Record Linkage Error.”
Presentation | August 4, 2021
opens in new tab“Using Supervised Machine Learning to Identify Efficient Blocking Schemes for Record Linkage.”
Journal Article | August 4, 2021
opens in new tabA Methodological Assessment of Privacy Preserving Record Linkage using Survey and Administrative Data.
Journal Article | June 7, 2022