Skip to main content

Monitoring Data Use to Demonstrate Impact

Research Brief
Two people looking at a computer monitor

Author

Sara Lafia
Research Methodologist, Methdology & Quantitative Social Sciences

December 2024

The rise in data-intensive research is transforming data sharing infrastructure.

Scientific discoveries and breakthroughs are increasingly data intensive.[1] Government agencies, research funders, and data providers have recently taken steps to enhance access to the data that they produce and are turning to analytics to understand the impact of the research findings that their data support. 

There is a growing interest in understanding the role of data in enabling research progress. What is the impact of data sharing and what more can be done to accelerate data-driven discovery across the research ecosystem? Monitoring and measuring the impact of data use must address the following challenges: data sharing, data access, and data citation.

“My work at NORC investigates how we can meet the growing need for methods and tools to support data users as they discover data and demonstrate evidence-based research impact.”

Research Methodologist, Methodology & Quantitative Social Sciences

“My work at NORC investigates how we can meet the growing need for methods and tools to support data users as they discover data and demonstrate evidence-based research impact.”

Encouraging Data Sharing

Funders such as the National Institutes of Health[2] and the National Science Foundation[3] require awardees to adhere to data management and sharing plans, including providing access to eligible primary data collected during funded work. 

Many publishers have also adopted data-sharing requirements as a condition of publication. Data-sharing requirements may be satisfied by depositing data in a repository, which distributes and ensures persistent access to the data. Sharing data and the metadata that describe them makes it possible for future users to discover and reuse existing data to generate new scientific insights as well as replicate, validate, and extend previous findings.

Innovations in Data Citation

When data are used in an analysis, credit should be given to the creator, ideally through a citation. Assigning persistent identifiers, such as digital object identifiers (DOIs), to datasets makes it easier for users to cite data and for data providers to track citations made to data over time. DOIs are persistent unique identifiers, or PIDs, that enable unambiguous references to a wide variety of objects, including publications and datasets. Datasets with DOIs can readily be linked to authors, institutions, and other kinds of entities in a growing citation graph. 

As of 2019, DataCite, a leading data infrastructure provider, announced that its PID graph contained over 6 million datasets, 4 million publications, 55,000 researchers, and 19,000 funders.[4] The use of PIDs, such as Open Researcher and Contributor IDs (ORCIDs) to identify authors and Research Organization Registry (RORs) to identify institutions, for example, makes it easier for stakeholders to answer questions about how data are being used and by whom. 

Increasing Public Data Access

Federal, state, and local governments continually fund the production of high-value data across a range of areas, including public health, the economy, and the environment. There is a growing interest across government that data produced by government entities should also be made more accessible for discovery and reuse. 

For example, the 2022 OSTP update to the Nelson Memo[5] stipulates that all federal grantmaking agencies are required to implement plans for public data sharing by 2025. The Advisory Committee on Data for Evidence Building[6] has also provided similar recommendations. The 2022 CHIPS and Science Act[7] provided authorization for a demonstration project to develop data-sharing infrastructure and tools as part of a future National Secure Data Service (NSDS),[8] envisioned to provide users with enhanced services for finding and using publicly available data for evidence-building.  

In support of these recommendations, NORC is currently developing models to support data discovery and prototyping a federal data usage platform to inform a potential future NSDS.



Demonstrating Data Impact

Are data-sharing efforts paying off? Ideally, it would be easy for data users to discover, analyze, and cite any publicly available dataset. Data stakeholders would be able to track citations made to datasets and learn about how the use of their data is shifting over time. However, the state of data citation is not ideal; many data providers have not yet been able to assign DOIs to their data and users often neglect to properly acknowledge data in their research outputs.[9]

Developments in artificial intelligence (AI) and natural language processing, coupled with large-scale search indexes, can be leveraged to close this gap by surfacing citation contexts for data in the text of publications, policy documents, and other outputs.[10] Agencies such as the USDA[11] have put forward dashboards that track data citations in research publications, which help demonstrate returns on the investments made to produce and share data.

Additionally, data archives such as the Inter-university Consortium for Political and Social Research (ICPSR) develop bibliographies of data-related literature, which provide valuable evidence to stakeholders by demonstrating how data are used and who benefits from data sharing.[12]

NORC is taking a similar approach to collect references made to the General Social Survey (GSS) that demonstrate the impact of GSS data in terms of the research fields, from sociology to health science, and research objects, from dissertations to policy documents, that the GSS supports. Monitoring data use sheds light on the knowledge communities that benefit from investments in creating and sharing data assets.

The Path Forward

Data sharing requirements from funders and publishers, coupled with increased capabilities from institutional repositories and archives, are nurturing a dynamic research ecosystem in which it is becoming easier to find and use data for evidence-building activities and to give credit to data producers. As more data becomes publicly available, it is imperative to develop techniques that enable stakeholders to monitor data use and understand its impact. 

Data citation infrastructure, such as persistent identifiers and AI methods to help surface hidden citations, can provide a path forward. Enhancing the ability of data providers to monitor data use will allow them to answer critical questions about the reach and visibility of their assets across the research ecosystem and inform strategic investments that prioritize equitable access to data.


References

[1] National Academies of Sciences, Engineering, and Medicine. 2022. Automated Research Workflows for Accelerated Discovery: Closing the Knowledge Discovery Loop. Washington, DC: The National Academies Press. https://doi.org/10.17226/26532

[2] NIH Data Sharing Policy. https://sharing.nih.gov/data-management-and-sharing-policy  

[3] NSF Data Sharing Policy. https://new.nsf.gov/funding/data-management-plan

[4] Fenner, M. (2019). Tracking the Growth of the PID Graph. https://datacite.org/blog/tracking-the-growth-of-the-pid-graph/  

[5] Ensuring Free, Immediate, and Equitable Access to Federally Funded Research. 2022. https://www.whitehouse.gov/wp-content/uploads/2022/08/08-2022-OSTP-Public-Access-Memo.pdf  

[6] Advisory Committee on Data for Evidence Building: Year 2 Report. 2022. https://www.bea.gov/system/files/2022-10/acdeb-year-2-report.pdf  

[7] H.R.4346 - CHIPS and Science Act. 2022. https://www.congress.gov/bill/117th-congress/house-bill/4346  

[8] The National Secure Data Service Demonstration. https://ncses.nsf.gov/initiatives/national-secure-data-service-demo  

[9] Lafia, S., Thomer, A., Moss, E., Bleckley, D., & Hemphill, L. (2023). How and Why Do Researchers Reference Data? A Study of Rhetorical Features and Functions of Data References in Academic Articles. Data Science Journal, 22(1), 10. https://doi.org/10.5334/dsj-2023-010  

[10] Lafia, S., Fan, L., & Hemphill, L. (2022). A natural language processing pipeline for detecting informal data references in academic literature. Proceedings of the Association for Information Science and Technology, 59(1), 169-178. https://doi.org/10.1002/pra2.614  

[11] National Agricultural Statistics Service. https://public.tableau.com/views/5WsofNASSDataUsage/The5Ws  

[12] ICPSR Bibliography of Data-Related Literature. https://www.icpsr.umich.edu/web/pages/ICPSR/citations/  

Suggested Citation

Lafia, S. (2024, December 12). Monitoring Data Use to Demonstrate Impact. [Web blog post]. NORC at the University of Chicago. Retrieved from https://www.norc.org.


Tags

Research Divisions

Departments, Centers & Programs



Experts

Explore NORC Research Science Projects

Analyzing Parent Narratives to Create Parent Gauge™

Helping Head Start build a tool to assess parent, family, and community engagement

Client:

National Head Start Association, Ford Foundation, Rainin Foundation, Region V Head Start Association

America in One Room

A “deliberative polling” experiment to bridge American partisanship

Client:

Stanford University