Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Go to the ORD website
Go to the QUERI website
HSRD Conference Logo



2023 HSR&D/QUERI National Conference Abstract

Printable View

3010 — Natural Language Processing for Health Services Research: Design, Development & Decision

Lead/Presenter: Suzanne Tamang,  Resource Center - HERC
All Authors: Tamang SR, Palo Alto; Reeves RM, Tennessee Valley Healthcare System; Gobbel GT, Tennessee Valley Healthcare System;

Workshop Objectives:
Provide guidance, best practices, and data-driven design options to members of the health services research community interested in making use of unstructured data sources in the electronic health record concerning whether and how to use natural language processing methodologies. Participants will be able to (1) explain NLP concepts relevant to understanding various language modelling approaches and their application to the VA's TIU database and other unstructured data sources (e.g., radiology reports, health factors data). (2) describe a framework that can be used to guide the development of an NLP study to address an open HSR question. This includes ascertaining whether the NLP component is well justified and appropriately framed with respect to the source data, specific NLP technique(s) and validation strategy, and ensuring that the design is aligned with the expertise of the study team. Lastly, participants will be able to (3) apply this framework to assess the appropriateness of NLP for example questions and identify key elements of a proposed research strategy and evaluation.

Activities:
Our workshop will provide a "crash course" on clinical NLP, seeking to demystify some common NLP jargon for the purpose of applying NLP techniques to clinical text within the VA data sources. We will share and describe an NLP study planning framework that can be used by participants to help design an HSR study. We will present sample NLP design decision considerations, starting from whether to use NLP, preparatory to research activities, through to how to ascertain gaps in information from both structured and unstructured sources, up to considerations of and strategies for computational environment, costs for data labelling, as well as the pros and cons of rule-based vs probabilistic systems. Working in breakout groups, participants will have the opportunity to apply what they've learned to example HSR studies, culminating in a larger whole-group discussion and reflection lead by the workshop facilitators.

Target Audience:
Health Services Researchers of all levels that are interested to leverage the treasure trove of data in the VA's database for HSR studies. We envision directing our activities toward both novices interested in the potential use of NLP within the healthcare research community, as well as informatics-oriented healthcare researchers interested in building clinical models, either consumers of NLP-derived variables to import into downstream models or researchers who have done some language modelling themselves.

Assumed Audience Familiarity with Topic:
We have assumed that participants will have a basic understanding of healthcare data at the VA. However, that there will be a relatively small proportion of participants that have implemented NLP methods in VA data and most of the audience will be novices to clinical NLP.