Skip to main content

Graduate Research Fellowship Program Pilot Project

Unrecognizable african-american engineer doing some research work with several electronics circuits and devices on a lab table.
Innovating data collection methods to track National Science Foundation research fellowship outcomes
  • Client
    National Science Foundation
  • Dates
    2015 - 2021


The National Science Foundation relied on outdated methods to track research fellowship outcomes.

The National Science Foundation’s (NSF) Graduate Research Fellowship Program (GRFP) helps ensure the quality, vitality, and diversity of the scientific and engineering workforce of the United States. It seeks to broaden participation in science and engineering of underrepresented groups, including women, minorities, persons with disabilities, and veterans.

For decades, the NSF relied on traditional survey methodologies to track the career outcomes of GRFP awardees. While these surveys obtained valuable data on academic experiences and careers, the NSF sought more cost-efficient and accurate methodologies that would take advantage of new data sources and inform future evaluation approaches. 


NORC developed innovative software that scraped data from online sources.

NORC had been working with the NSF and already had a database of more than 28,000 STEM graduates. We created a subset of graduates to gather career information, including academic achievements, employment, publications, patents, and grants. NORC developed and refined software to scrape this information from online sources, incorporating an Application Programming Interface (API) that allows software applications to communicate with each other. We used several metrics to verify the accuracy of this new software, including return rate, precision, and accuracy. We also created a dataset gathered by hand and compared that with the scraped information, finding that the results were very similar. 


NORC reduced data collection time and improved accuracy.

The software NORC developed reduces data collection time by months and creates accurate results. This program also proved that a data collection strategy employing APIs and public databases could create valid data more efficiently. NORC also showed that assessing scholarly output using automated techniques is feasible and reliable.

By bypassing traditional survey methodologies, this new data collection effort reduces the inherent bias that can be created when some people do not respond to a survey. Using machine learning techniques, the software more accurately determines which author and publication correspond to a specific individual and which do not.

The GRFP pilot project provides a roadmap for other avenues of study, including:

  • Sequencing of data collection methods to improve accuracy and efficiency
  • Exploring more machine learning techniques to validate records
  • Assessing intellectual productivity through advanced models
  • Using similar approaches to evaluate other NSF initiatives

Project Leads

“We are developing powerful new, low-cost data collection techniques that have the potential to vastly increase the size and diversity of samples for our clients.”

Vice President

“We are developing powerful new, low-cost data collection techniques that have the potential to vastly increase the size and diversity of samples for our clients.”

Explore NORC Education Projects

2023 American Law School Faculty Study

Reviewing law school policies and understanding the career pathways of law school teaching faculty


Association of American Law Schools

2024 National Survey of Early Care and Education

Examining early care and education after major disruption


Office of Planning, Research, and Evaluation in the Department of Health and Human Services