skip to page content
Talk to the Veterans Crisis Line now
U.S. flag
An official website of the United States government

Health Services Research & Development

Go to the ORD website
Go to the QUERI website
HSRD Conference Logo



2023 HSR&D/QUERI National Conference Abstract

Printable View

1008 — Acceleration and improvement of COVID-19 case review with machine learning and deep language models

Lead/Presenter: Kelly Peterson,  VHA Office of Analytics and Performance Integration (API)
All Authors: Peterson KS (VHA Office of Analytics and Performance Integration (API)), Brannen, J (VA Office of Clinical Systems Development and Evaluation (CSDE)) Chapman, A (VA Salt Lake City Health Care System; Division of Epidemiology, University of Utah) Pham, R (VA Office of Information and Technology (OIT)) Turano, A (VA Office of Information and Technology (OIT)) Stevens V (Informatics, Decision-Enhancement and Analytic Sciences Center (IDEAS) Center, VA Salt Lake City Health Care System; VA Office of Clinical Systems Development and Evaluation (CSDE); Division of Epidemiology, University of Utah, Salt Lake City) Jones M (Informatics, Decision-Enhancement and Analytic Sciences Center (IDEAS) Center, VA Salt Lake City Health Care System; Division of Epidemiology, University of Utah) Plomondon M (VA Office of Clinical Systems Development and Evaluation (CSDE)) Box T (VHA Office of Analytics and Performance Integration (API)) Francis J (VHA Office of Analytics and Performance Integration (API))

Objectives:
Many Veterans are tested for COVID-19 outside VA and, therefore, do not have structured documentation of positive tests. To increase capture of these cases, a rule-based natural language processing system was implemented to identify positive tests in clinical text. These cases needed manual review to maintain a high positive predictive value in the VA National Surveillance Tool for COVID-19. As the volume of potential cases increased, we explored machine learning methods to identify highly probable cases and reduce manual review burden.

Methods:
We trained models using XGBoost to predict case positivity from both structured (e.g., health factors, orders, diagnosis codes) and unstructured data (i.e., text). Separately, deep transformer language classification models were trained on clinical text to independently predict case positivity. This text prediction was incorporated as a feature in the XGBoost model. Five off-the-shelf language models were evaluated, which varied by text sources they were trained on. All models were periodically retrained due to drift. These re-trainings included random search of hyperparameters, (e.g., training iterations and maximum tree depth). Model performance was evaluated by positive predictive value (PPV), sensitivity, and area under the receiver operating characteristic (AUROC) using cross validation. Per standard practice, model performance was evaluated with a validation set held out from training. Performance statistics are described on the last model as of July 2022.

Results:
Over 122 million clinical documents have been processed to perform predictions for over 2.6 million Veterans and identify 47.1% of the 688k positive COVID cases to date. Evaluation of the current model in a validation set of 16,971 cases shows that for confirmed positive cases, the PPV, sensitivity, and AUROC were 86.4, 96.3, and 89.6, respectively. The best language model for text classification was a BERT-based model which included non-VA clinical notes and COVID biomedical literature in its training. Since our initial model was deployed in July 2020, seven models have been deployed to account for evolving language around COVID-19 captured by expert reviewers.

Implications:
Machine learning models demonstrated a high level of accuracy and helped expedite clinical review during the rapidly changing COVID-19 pandemic. However, it still required human-in-the-loop review because of the very high positive predictive value demanded by operational use cases. Serendipitously, expert review identified data drift which necessitated periodic model retraining. We demonstrated the successful addition of machine learning models to existing rule-based natural language processing as we moved from a data poor to a data rich (early to late pandemic) setting. These models supported a hybrid machine-expert system which has identified a substantial proportion of all identified Veteran cases to date.

Impacts:
Deployment of an accurate machine learning model for case review prioritization contributed to more accurate and complete reporting of VHA COVID-19 cases as nearly half of all known cases come from this process. These models were a successful addition to the VA National Surveillance Tool because of their improved performance, human experts-in-the-loop, and periodic updating to account for data drift.